Senior site reliability consultant

Reports to:  Project Lead
Experience:  5+ years
Start date:  1st August 2022

Responsibilities
  • Responsible for Toil Reduction, implementing identified improvement opportunities, and handling minor enhancement and non-ticketed activity.
  • Define and monitor service level metrics that include Reliability metrics like MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc.
  • Create rules to optimize incident response by metrics, streamlining alert flows, and collaboration and communication across squads.
  • Proactively identify the issues that might disrupt the service in production
  • Address incoming service requests to their support groups/Jira tool
  • Create and maintain alerts
  • Change validation or change planning-related requests
  • Assist business stakeholders in determining SLO or adjusting threshold limits
  • Demand and capacity management & make corrections to SLI/SLO threshold limits
  • Gather and analyze metrics from both Infrastructure and applications to assist in bug fixing
  • Engage in capacity planning & performance tuning exercises
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Participate in system design consulting, platform management, and capacity planning
  • Create sustainable systems and services through automation and uplifts
  • Balance feature development speed and reliability with well-defined service level objective (SLO, SLI)
  • Debug production issues across services and levels of the stack.
required skills and qualification
  • Bachelor’s degree in computer science or other highly technical, scientific discipline
  • Ability to program (structured and OO) with one or more high-level languages, such as Python, Java, Ruby, C/C++, and JavaScript
  • Experience in AEM, Webservices/APIs
  • Experience in working with Public Clouds (Min 3 years experience is a must)
  • Experience with Git or other source control systems
  • Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines
  • Working knowledge in service level definitions and identifying the KPIs
  • Working knowledge of the TCP/IP stack, internet routing, and load balancing
  • Experience with distributed storage technologies like NFS, HDFS, Ceph
  • Experience in Observability strategy

Delivery Model: Onsite
Job Type: Full Time
Job Location: Auckland

Apply for this position