Our business digitalization is expecting faster delivery of new features with an even growing need for uptime, performance and stability of the IT services and applications, with the objective to improve our customer satisfaction and retention "Deliver solutions that our customers, partners and employees love to use"The Service Reliability Engineer (SRE) is contributing to the overall quality of the delivered IT services with the objective to meet agreed service levels, assuring that IT services are delivered with the right level of availability, resiliency, performance, monitoring, and capacity , by analyzing and understanding the current situation & context (functional and non-functional), analyzing events and correlating data, identi
The Service Reliability Engineer as part of a DevOps-organization, creates a culture of "service reliability" within the DevOps teams (product feature teams).
He/she is constantly balancing delivery speed of new or changed features with the reliability of the underlying platforms and systems. It's a way to bring Quality of Service and the ownership of a reliable service & application into product and feature teams
The SRE entire job is to constantly improve the reliability of the services & applications. So continuous improvement is inherent to SRE operations in a DevOps-centric organization.
The SRE will be exposed to numerous services and their understanding of both developer and IT operations responsibilities, they can help spread system knowledge across the broader team and improve visibility of the entire application system. He will be exposed to development, deployment, configuration, orchestration and everything in between.
The SREs objective is pro-actively and as early as possible (beginning of projects) be involved in Non-Functional discussion with dev-ops teams making sure that the ORA is covered and answering our SLA requirements : He will work close with the Architects and Feature/Product teams to understand and challenge Architectural designs and setups , he/she will also for existing environments (re-active) identify system weaknesses, and making proposals to solve those issues before they become major incidents.
He/she improves the reliability of technical services through deeper collaboration and proactive optimization of redundancies and monitoring and alerting practices.
He-she will be consulted in complex problems and is driving technical taskforces whenever required and conducting post major outage reviews
He-she is strongly involved when it comes to the handling of emergencies, troubleshooting of incidents and resolution of critical problems, analyzing operational data (application, platform and system information.
Being able to solve problems effectively requires the ability to work well with others. SREs should not be expected to know all the answers; instead, they should be able to know who on the team or within the organization to ask for help and how to communicate with them.
He-she is consulted for major changes , migrations and releases addressing the right challenge towards change-implementors-requestors , assuring the potential operational risk are well understood and mitigated.
He-she is identifies operational risks , provides alternatives-proposals for remediation.
He-she strongly contributes in providing his-her operational experience to build high available, resilient and scalable solutions and provides input for technology roadmaps and strategy .
The Service Reliability Engineer has the ability to critically examine a system-middleware-application-platform and uses that to guide and challenge the existing implementations .
He/she can use specialized tools and understand, interprete and correlate data and events with the objective to identify the root causes and propose improvements (functional - non functional), if required he/she will onboard infra-middleware-DB-Dev- experts , or external consultancy to deep dive .
He-she is also looking for developing automated solutions for operational aspects such as on-call monitoring, performance and capacity planning, and disaster response.
He-She has an IT operational background , overall good knowledge on infrastructure and understands the development practices
He-she collaborates closely with product and feature teams to ensure that the designed solution responds to non-functional requirements such as reliability and security (incl. the right level of availability, performance, and maintainability)
He-she supports - challenges product-feature teams permanently to increase the level of stability, availability, latency, performance, resiliency, response-times, scalability and capacity.
Key Skills
Sélectionnez le secteur qui vous intéresse ci-dessous et n'oubliez pas votre adresse email!