We are looking for a Data Engineer that will help us build data pipelines and all data related work on AWS to design, develop, test and integrate a critical security related application.
Your primary focus will be in integrating the SoRs and extracting the data feeds from the current input systems. Then ship this data to AWS. On AWS build the data pipelines using tools such as DMS, Glue, EMR, Spark, Hive, S3, Python and RDS services. Also experience in complex data ingestion in a very systematic way by doing data registration, format matching, Serde, metadata preservation and management is required. Experience working on building data pipelines that provide complete metadata, handles data quality issues, error management for the ETL pipelines, build and preserve data lineage and data linkages is also a plus.
Desired Skills: 7 - 12 years of solid hands-on experience in ETL and data warehousing space Must have recent 3 to 5 years of hands-on experience working on AWS and other cloud environments. Hands on experience on Big Data technologies such as Hadoop, Spark, Sqoop, Hive, Atlas with knowledge in developing UNIX and python scripts Strong experience in Data Integration and ETL/ECTL/ELT techniques Must have hands-on experience in building data models for Data Lakes, EDWs and Data Marts using 3NF, De-normalized data models, Dimensional models (Star, Snowflake, Constellations, etc.) Should have strong technical experience in Design (Mapping specifications, HLD, LLD), Development (Coding, Unit testing) using big data technologies Should have experience with SQL database programming, SQL performance tuning, relational model analysis. Must have experience in Python and its ML libraries to be able to write data feeds, ML validation and testing scripts for ML based predictive models. Should be able to provide oversight and technical guidance for developers on ETL and data pipelines Must have good communications skills and should be able to lead meetings, technical discussions and escalation calls.
Associated topics: data administrator, data architect, data engineer, data integration, data manager, data management, data warehousing, database, mongo database, sybase