About the role
The Site Reliability Engineer is a Product Team member in the Infrastructure team. You will engineer, manage, and maintain our hosting platform and infrastructure, allowing secure and scalable hosting. You’ll also be central to the future development of our infrastructure services and support both internal teams and external partners. You will work with sys admins, DevOps engineers, and users of the CHT directly – this includes building new features, ensuring support, fixing bugs, testing applications, and ensuring we’re working on the most impactful things. You will work with a distributed team based around the world, and you will report to an Engineering Manager.
Product Team’s Core Competencies
As a team, we have adopted a set of “core competencies” for how we show up for each other at work to be great teammates for each other.
Reliable – Sets and communicates clear expectations about when something needs to/will be done and does it without prompting.
Team Player – Acts in the team’s best interest and actively looks for ways to help their colleagues. Makes time to support teammates to be successful.
Growth Mindset – Always seeking to improve. Open-minded, teachable, and coachable.
Proactive – Sees things that need doing and takes action to keep things moving and make the team successful.
Effective Communicator – Communicates regularly, openly, and effectively using the appropriate channels.
Key Responsibilities
Proactive Monitoring and Team Support
Proactively monitor performance and reliability of production Medic systems
Produce status pages consumable by non-technical users
Consult on technical needs for larger-scale deployments, including local hosting, scalability, etc
Provide remote troubleshooting support to active deployments as needed
Prioritize urgent troubleshooting problems in live instances
Identify possible production problems by checking through or reviewing the issues that have been reported
Follow up and investigate questions asked on Slack channels and the CHT forum
Keeping in contact with Core Devs and QA teams
Provide technical information, explain processes, clarify interactions when requested and ensure proper documentation.
Manage upgrades and upgrade processes on production instances.
Automate deployments to increase testability and reliability.
Automate deployment monitoring and alerting
Support scaling – Proactively seek new technologies or implementations that solve current problems better or more efficiently
Troubleshooting – Prioritize and provide remote troubleshooting support to active deployments as needed.
Documentation – Write technical information, explain processes, clarify interactions when requested, and ensure proper documentation.
Support shifts—Work dedicated support tasks (not on-call) once every three weeks, primarily assisting other internal teams or external partners.
Skills Knowledge and Expertise
Required Skills and Qualifications
Good understanding of DevOps concepts and best practices
3+ years of experience with Kubernetes, with concrete results
Experience in one or more programming languages, preferably Javascript
Fluent in English and experience using it in a remote work environment, e.g., over video and text chats
Ability to work in a remote and culturally diverse team
Detective Skills: Terrific at troubleshooting and debugging.
Problem-solving skills
Linux system administration, monitoring, security best practices, networking, and logging.
You must have valid authorization to work in the country that you are based without requiring sponsorship.
Travel Requirement: Candidates should be aware that this role may entail up to 25% travel, including both domestic and international travel to various locations. Most of these locations are in East Africa, West Africa, or Nepal.
Apply via :
medic.pinpointhq.com