Site Reliability Engineering Manager

Duties & Responsibilities:

Strong adaptive problem solving, program management skills around planning, execution, communication, risk management, and stakeholder management.
Supervise a team of SREs, ensuring that production applications your team supports are stable, reliable, and well-documented.
Build and maintain robust & sophisticated production system monitoring for quickly identifying issues preemptively before surfacing it through operations & applications.
Work closely with tech support, operations, product, engineering managers and development teams to ensure that platforms are designed with scale and operability in mind. Interface with bugs, issues, tasks related to production performance & get it resolved through SRE team or engineering teams.
Resolve all the issues within the committed SLA’s of each issue bucket (P0, P1, P2, P3 & P4 etc)
Troubleshoot and debug complex issues in production applications
Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale environment
Be available anytime for escalations affecting your products; serve as the face of your team to other teams at Sokowatch.
Function well in a fast-paced, rapidly-changing environment
Communicate effectively with people at all levels of the organization
Be a mentor on Agile/Scrum/Kanban, SRE, Product & Development processes.
Excellent written and oral communication skills

Requirements:

10+ years prior experience in large company wide site reliability engineering and management.
Hands-on experience in DevOps & using Google Cloud Platform (GCP), Docker, Kubernetes with proper metrics instrumentation in software components, to help facilitate real time and remote troubleshooting/performance monitoring.
Strong B2B, B2C/B2B2C e-commerce domain knowledge.
Bachelor or Masters degree in a quantitative field from a premier institute.
Excellent problem solving, prototyping articulation & communication skills
Sound understanding of areas in Computer Science such as Algorithms, Data Structures, Object Oriented Design, Databases. Proficiency in at least one modern programming language such as Java, Javascript or Python.

Apply via :

www.linkedin.com

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

More posts