Job Description

This project consists of transforming our cluster (UDM) from a production-level system used across multiple orgs to a reporting-level system used within CF only. It consists of several tasks that focus on i) reducing the surface area, ii) deprecating and migrating core pipelines off of UDM and iii) upgrading the cluster configuration based on the new needs. In addition, third-party vendor is also expected to address company-wide mandates related to the Digital Markets Act (DMA) and security initiatives.
   UDM Migration:
1. Off-board external teams
2. Off-board internal teams’ production use cases
3. Migrate key pipelines off of UDM
4. Migrate derived datasets off of UDM
5. Clean up the cluster
6. Update cluster configuration
7. Automate cluster management and maintenance
   DMA and Security:
1. Move UDM to private subnet
2. DMA Tagging of Customer Behaviour Datasets
  Candidates are expected to have 2+ years of experience in 5 out of the 6 following domains.
• Knowledge of distributed systems as it pertains to data storage and computing
• Experience with data modelling, warehousing and building ETL pipelines
• Experience with big data technologies such as: Hadoop, Hive, Spark, EMR
• Experience programming with at least one programming language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
• Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
• Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)