We are looking for a Site Reliability Engineer with an operations and software engineering background to help us build and run large-scale, distributed, fault-tolerant systems.
Qualities we’re looking for
Thoughtful coder.
You understand the importance of abstractions and interfaces. You keep modules loosely coupled and know that algorithms + data structures = programs.
You read and understand existing systems before diving in, then you research and stand on the shoulders of giants to follow best practices. You know how to prototype, how to iterate, and when to step back and think it through or ask questions.
Builder.
You are committed to the projects you work on and need to see them through to completion. You understand that solving the user’s problem is the end goal.
You prefer open systems that are verifiably secure, you publish and use open source code, like we do.
Lifelong learner.
You stay up to date with the latest trends and are excited to learn new languages, tools, and best practices.
Explorer.
You thrive in teams and projects that span time zones and cultures.
You’re ready and excited to travel in order to support projects, no matter how dusty or remote.
Requirements
Essential
Minimum 3 years maintaining production systems on Linux.
Minimum 2 years writing production web applications.
Minimum 1 year working with deployment or infrastructure tools, e.g. Ansible, Chef, Puppet.
Experience working with remote teams.
Strong attention to detail and understanding of architectural dependencies.
Strong troubleshooting and problem solving skills.
Experience in monitoring resource usage.
Experience in communicating with users, other technical, and project management teams to collect requirements.
Good oral and written communication skills.
Desirable
Experience managing and automating infrastructure on AWS, GCP, and Azure.
Experience writing Clojure, Java, JavaScript, and Python.
Experience using Ansible, Terraform, and Hashicorp Vault.
Experience using Docker, Kubernetes, and KOPS.
Experience using InfluxDB, Graphite, and Grafana.
Experience using Monit, Nginx, and SystemD.