Senior Production Support Engineer

As a Senior Production Support Engineer you will support, monitor and maintain the high availability of our platform across all of our markets. You’ll work closely with engineering, CX, product and program teams globally for production incident response and post-mortem processes. You’ll work with the CX team to discover areas of improvement for our product based on their feedback and the customer communication. You’ll continuously review and improve our existing monitoring and alerting systems.
What You’ll Do

Ownership of risk event process across some of our markets: coordinate teams responding to an incident, communicate effectively, oversee post-mortem and monitor that the follow-up action items are completed
Continuous improvement of our monitoring dashboards and alerts
In collaboration with the CX team, identify patterns in customer and product issues and propose improvements
Identify and communicate repeating themes around risk events and propose improvements to prevent recurrence of the same issues
Keep track of metrics related to production performance and identify areas of improvement
Continuous improvements of our documentation library to allow faster onboarding of new team members and more efficient response times

Qualifications

4+ years of experience working in technology environment with experience in microservices architecture
4+ years of experience in incident response or similar role
Knowledge of various monitoring platforms such as AWS CloudWatch, SumoLogic, APM monitoring (NewRelic, Instana), mobile (Crashlytics data), BI (Looker)
Sufficient knowledge of relational databases to be able to construct basic queries
Ability to work independently and make decisions with limited information when under time pressure
Excellent debugging skills
Excellent documentation and organizational skills
Ability to coordinate incident response and communicate effectively with stakeholders from variety of teams across different timezones
Ability to remain calm under pressure during a production incident resolution

Apply via :

jobs.lever.co