• Location: USA, North Carolina, Durham
  • Salary: Negotiable
  • Technology: AWS Jobs
  • Job Type: Permanent
  • Date Posted: 8th Feb, 2019
  • Reference: 02072019MD_1549662253
The Cloud Operations Engineer is part of an agile, dynamic team of technical experts responsible for the Service Reliability of cloud-based services and solutions for the enterprise. This role is responsible for the build, deployment, and daily operation of software-defined infrastructure and services for business initiatives.

What You'll Do:

* Work in an Agile-based environment to build, operate, monitor and maintain cloud-based platforms and solutions for mission-critical systems
* Scale infrastructure to meet growing capacity and launch new applications in both private and public cloud
* Responsible for driving operational excellence by implementing strategies for continuous integration/continuous delivery of Product Teams on the Cloud Platform and any dependent software on the cloud
* Own the day-to-day health, uptime, monitoring, and Service Reliability for public and private PaaS and IaaS services
* Develop monitoring solutions and appropriate metrics to measure performance and efficiency of complex and high-traffic environments
* Define and implement best cloud practices while documenting support processes to ensure service availability for all managed systems
* Participate in on-call rotation, receiving and responding to daytime and after-hours alerts
* Engage in problem resolution and root cause analysis of system and application incidents





What You'll Need:



* Bachelor's degree in Computer Science, Engineering or related field
* 5+ years' Systems experience in private and/or public cloud data center environments with Windows and Linux based server systems such as Windows Server 2012r2 - 2016, CentOS, CoreOS, and Ubuntu
* Minimum 2+ years' experience or strong working knowledge of automating the build, configuration, and maintenance of IaaS and PaaS solutions in either AWS, or Azure
* Experience with performing development and deployment activities on a private or public Cloud solution or a comparable high availability environment
* Exposure with monitoring, configuration management, and Cloud orchestration tools, including New Relic, AWS CloudWatch, Ansible, Chef, Puppet, AWS, Cloudformation and/or Terraform
* Exposure to one or more of Containers and Container Orchestration frameworks: Docker, Kubernetes, Docker Swarm, Amazon ECS, Amazon EKS, AWS Fargate, etc.
* A working understanding of code and script (Java, Go, NodeJS, Python, and/or Ruby)
* Demonstrable experience performance tuning, troubleshooting and resolving problems quickly and effectively in a production environment
* Working knowledge of disaster recovery, high availability and other technologies and principles that support business continuity; experience with DR capabilities in cloud and/or virtualized environments
* In depth understanding of TCP/IP LAN/WAN networking technologies and troubleshooting techniques
* Experience with cloud data protection and backup operations for highly critical business systems