AWS Data Engineer - Hybrid - £70,000
The Data Engineer forms an essential part of the Data Science Team, primarily by manipulating, cleansing and transforming data to be 'machine learning ready'.
This is a pivotal role in the team by partnering with key technical stakeholders, such as Solutions Architects and the IT Department, to support the build of productionised solutions, to enable the benefits of data science solutions to be realised
* Leading on data engineering and ETL tasks which will exploit the AWS machine learning stack to examine novel machine learning problems within the rail industry.
* Working closely with data scientists by preparing data for them to develop statistical algorithms to implement solutions to key business challenges and advising on the best approach to do this.
* Develop best practices for ML Ops, engineering tasks, code development, code deployment, ethics, and approach to productionising solutions.
* Provide quality assurance of engineering tasks by code checking and any other practices necessary
* Conducting feasibility and practicality testing of business challenge-led machine learning ideas, to help strengthen the data science portfolio.
* Mapping out data feeds and systems in collaboration with Solutions Architects and the IT Team, to then build richer pools of data for proof of concept/production solution design and build.
* A proven track record of creating and designing data pipelines, exposing and linking data from multiple systems
* Experience with security and monitoring best practice, preferably using AWS Cloud infrastructure
* A good understanding of coding best practices and experience with code and data versioning (using Git/CodeCommit), code quality and optimisation, error handling, logging, monitoring, validation and alerting.
* Experience of iteratively making data 'machine learning ready' preferably within AWS machine learning stack (primarily SageMaker)
* Fluent in writing well tested, readable code using Python that is capable of processing large volumes of data.
* An excellent command of the basic libraries for data science (e.g. NumPy, Pandas)
* Experience in writing complex queries to gather insight against relational and non-relational data sources.
* Technical experience of mapping out data feeds to integrate and separate data to produce, transform and test new machine learning ideas
Please reach out if you'd like to learn more about this opportunity.