Contextually, In the first steps of the cloud transformation, Programs' IT managers have been more focusing on the feature delivery and quality of software. As we have already started deployment in production environment, we need to guarantee that the business service works operationally as defined and required
Therefore, there is a significant need to have our systems / applications reliable
This reliability is characterized by:
* Adopting a Run Culture
* Automating improvement opportunities when it makes sense
* Implementing ITSM practices (SLO, SLI, Alerting, Monitoring, On call, Incident, Problem, Change, Security, availability, continuity …)
* Being compliant regarding the requirements provided by regulators, auditors and internal control
The reliability practice is composed of multiple sub-practices (Incident Response, Change Management, Problem management …) that need to be defined and deployed through the development teams. For some it has already started.
To enforce our capacity to define, deploy and support development teams in applying the reliability practice, we are looking to hire a Practice officer that will work closely to the development team as well as the practice owner.
The daily job of the Practice officer will be to define and then deploy and enforce the practice through the developments teams organized in tribes. He will be responsible for the:
* Development, maintenance and enforcement of all incident management documentation and detailed processes (including critical incident management)
* Management and improvement of incident health to include program backlog, timely updates, adherence to IM policy/process, quality of data within incidents
* Ownership of the Incident Management and Major Incident Management modules within ServiceNow including any proposed changes needed to mature or improve Incident Management
* Continual improvement of all incident related process/procedures, incident quality, tooling and instrumentation, dashboards/reporting & technician training
* Ownership of Incident related Service Level Agreements (SLAs), working with SLM and other functional leads to determine root cause for and resolution of SLA misses related to incidents.
MUST HAVE skills
* SRE knowledge
* Languages: English and French (spoken and written)
* communication skills
* Coaching skills
* Servant leader
* Management skills
* Suite Office