My client is migrating from their existing existing Netezza/Datastage environment to Snowflake in AWS across a number of business units including Bank, P&C, etc. They have been asked to provide 2 Principal Cloud & Data Architects to help assess and define cloud architecture, choose tooling, design the workflows, and other tasks required to ensure successful delivery of various cloud migration projects. These client's architects should possess complementary skills to fill the broad requirements of the project.
Customer Environment "Current Netezza Environment":
* 10-13 years of historical data
* 1,050 Netezza tables
* Note - existing ETL jobs, existing SQL and some BI reports (Business Objects and Tableau) have Netezza functions
* Data Growth YOY: less than 0.5TB
* Tableau On-Premise
* 269 Tableau reports will need to be tested and/or updated
* Note that Business Objects is used as well but no Business Objects are reports are part of this work stream
* IBM DataStage On-Premise & Netezza SQL
* 4,000 ETL jobs will need to be migrated
* 70% of identified jobs will need to be converted from ETL to ELT
* 80% of feeds come from mainframe (host mainframe converts PII into non-sensitive information)
* Flat files and DataStage datasets are used to load to tables
* Use the Netezza connector for stage to bulk loading
* No data integration tools will be retired
* Documentation is available for "most of the assets"
* Business metadata is available for "60% of the assets"
* Real Time / Streaming: N/A for Bank
* Total Number of Interfaces to be Developed for Ingestion: ~400"
Logical Concepts for Candidate "Typical questions that these architects should be well versed in are as follows:
1 Approaches to quickly and most efficiently move historical loads or around 80TB to Snowflake from Netezza without consuming too much resources on existing Bank Netezza appliance for the migration. Approaches that they have employed with other large scale migrations with big companies will be or particular interest.
2 Moving daily/weekly delta from on-prem to Snowflake as part of historical migration. Any replication tools used for previous engagements?
3 ETL/ELT tools that work best with Snowflake like Talend, Matillion etc that they have used in prior engagements? What do customers that are on Datastage or informatica on-prem go to?
4 Any data abstraction layer for data consumption?
5 Data management tools used in prior engagements for governance, lineage. Is IGC an option?
6 Securing Snowflake accounts (other than whitelisting)
7 Service accounts in Snowflake - how to connect securely? Options like OAUTH, CyberArk, Okta with Service accounts.
8 Sampling and scanning of data in Snowflake for sensitive data elements. What tools have been used?
9 Talk about experience setting up a cloud data lake for a highly regulated entity.
10 Depict a reference architecture for a data lake ecosystem covering from producers (on-prem, cloud) to consumers (on-prem, cloud), with S3 and Snowflake as the primary data stores.
11 Describe an implementation of this architecture for a large customer (highlighting technologies chosen).
12 Describe an approach to addressing information governance requirements when it comes to public cloud, including how you know exactly what data is where, and how to prevent unintended data from existing in the environment.
1. Significant experience with Netezza and ETL processes using Netezza as a source and target.
2. Experience using SQL, experience with cloud data warehousing (Snowflake preferred, but Redshift ok).
3. Experience using Java, Python, Spark and Kafka
4. Some experience with with data virtualization (Denodo preferred)
5. Some experience with cloud security, data warehouse security, data governance, data lineage, metadata management.
6. Excellent communication and organizational skills.
7. Ability to lead and mentor the staff on the technologies in use.
8. Ability to accurately communicate to the client team regarding the technical structure of the project and any challenges. "
Roles & Responsibilities
* Architects will need to be skilled in comprehensive cloud and data architecture (foundational technologies are Snowflake and AWS moving from Netezza) plus ETL both on premise (DataStage) and cloud (options), BI, Orchestration, Cloud Data Governance, Security Specifics (AWS and Snowflake), Data Lineage, Data Governance, Hadoop Migrations to the Cloud (HDP), Metadata Lineage, Metadata Scanning and Cataloging, SAS Workload Migration, Data Virtualization, Data Quality, and Data Profiling. Any experience in a highly regulated industry will be a plus."
If interested in this position please apply to the job and email me at firstname.lastname@example.org or give me a call at 813-437-6964