Site Reliability Engineer

 

Description:

Zendesk is a service-first CRM company that builds powerful, customizable software designed to improve customer relations. At Zendesk, we encourage growth, innovation and believe in giving back to the communities we call home.

 

This is a great opportunity to help Zendesk build a new Site Reliability Engineering - Operations (ZNOC) team in EMEA. The ZNOC team is responsible for the detection and mitigation of customer impacting incidents at Zendesk and builds solutions to support the reliability and availability of Zendesk products.

 

You will join a newly-created team that implements this function in EMEA and partners with our strong Engineering and Product organizations. The ideal candidate will have experience in technical operations role (ideally SRE) and programming skills. You enjoy active participation in incident response and and supporting and troubleshooting large scale distributed systems and partnering with teams to improve reliability.

Responsibilities

  • Discover problems within distributed cloud native applications using logs, telemetry and alerting.

  • Mitigate urgent problems and work with teammates to solve underlying issues Write code to automate the mitigations and improve tools.

  • Teach others how to detect and fix repeat problems.

  • Provide and institute proven practices around reliability, remediations, and troubleshooting.

  • Build vital and efficient tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability.

 

Desired qualities

  • Experience in a Software, Infrastructure, Systems, and/or Site Reliability Engineering role.

  • A successful track record of troubleshooting distributed systems during service incidents while remaining level-headed.

  • Knowledge of Kubernetes, NGINX, and networking.

  • Experience in monitoring large-scale SaaS-type products or services Experience in a software development environment.

  • A strong curiosity for the unknown and not stopping until you have a solid understanding.

  • An understanding of what makes up the incident lifecycle.

Organization Zendesk
Industry Engineering
Occupational Category Site Reliability Engineer
Job Location Dublin,Ireland
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2022-03-17 5:24 am
Expires on Expired