Ref: a0M1i00000JWm8mEAD_1650637115

Senior Observability and SRE Engineer

USA, New York

Job description

Senior Observability and SRE Engineer

a0M1i00000JWm8mEAD_1650637115

Summary:

Strong knowledge in observability and Site reliability engineer (SRE) with experience automating and pro-actively monitoring DevOps platforms and a passion for developing and architecting automation solutions., should be able to handle first point escalation for all technical and process issues. Provide technical subject matter expertise wherever required. Ensure proper communication and quick resolution as a crisis manager. Plan and schedule Changes, Coordinating with different stakeholders. Perform RCA for Major Incident's related to his / her tower Follow quality / security process defined for the engagement. Perform Trend analysis, identify top few incidents and work with respective teams/individual to minimize the incidents, Hardware troubleshooting & Vendor coordination Prepare Weekly and monthly status reports. Participate in business meetings with various stake holders on a need basis. Take corrective actions based on the customer satisfaction surveys. Work on the service improvement programs. Effort estimation/reviews on need basis for new projects. Training of new team members. Able to work on Knowledge acquisition and updates to related document

Role Description:

Strong knowledge in observability and Site reliability engineer (SRE) with experience automating and pro-actively monitoring DevOps platforms and a passion for developing and architecting automation solutions., should be able to handle first point escalation for all technical and process issues. Provide technical subject matter expertise wherever required. Ensure proper communication and quick resolution as a crisis manager. Plan and schedule Changes, Coordinating with different stakeholders. Perform RCA for Major Incident's related to his / her tower Follow quality / security process defined for the engagement. Perform Trend analysis, identify top few incidents and work with respective teams/individual to minimize the incidents, Hardware troubleshooting & Vendor coordination Prepare Weekly and monthly status reports. Participate in business meetings with various stake holders on a need basis. Take corrective actions based on the customer satisfaction surveys. Work on the service improvement programs. Effort estimation/reviews on need basis for new projects. Training of new team members. Able to work on Knowledge acquisition and updates to related document

1) Strong scripting/programming skills
Bash, Python, Go etc. - Strong scripting skills (coding is really a must) not ALL but at least 1: Bash, Python, GoLang, all nice to have.
2) Understanding of Jenkins, SDLC, Agile and DevOps