Job Description
We are looking for an experienced DevOps engineer to operate at the interface of development and operations within our company and their involvement in each stage of a product’s existence should promote efficiency and, ultimately, increased revenue.
As a DevOps engineer, you will help us build functional systems that improve customer experience. The role is critical to the organization’s overall success, right from planning to supporting primary KPIs such as customer satisfaction and productivity. You will be integrating the project functions and resources across the product life cycle, right from planning, building, testing, and deployment to support.
If you’re dedicated and ambitious and have a solid background in software engineering (familiar with Ruby or Python) we’d like to hear from you!
Responsibilities
This individual will be responsible for:
Site Reliability Engineering (SRE)
Implement and maintain best practices for ensuring the reliability and availability of web applications and services.
Monitor system performance, troubleshoot issues, and implement proactive measures to prevent downtime.
Collaborate with cross-functional teams to develop and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Participate in incident management, post-incident reviews, and root cause analysis to continuously improve system reliability.
Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimizing the wastage
Infrastructure:
Design, build, automate and manage the infrastructure that underlies the application stack, including cloud resources (e.g., Google Cloud, AWS), servers, networks, and storage.
Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
Automate infrastructure provisioning and management using tools like Terraform, Ansible, or Kubernetes.
Implement security best practices and ensure compliance with industry standards in infrastructure design.
Optimize and scale infrastructure to meet growing demand.
Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
Developer Experience:
Support and enhance the development process by providing tools and practices that improve developer productivity.
Collaborate with software development teams to set up and streamline the CI/CD (Continuous Integration/Continuous Deployment) pipeline.
Create and maintain development environments, including development, staging, and production environments.
Have the expertise and assist developers in debugging, performance optimization, and troubleshooting issues in the development lifecycle.
Tech Financial Operations:
Manage and optimize technology-related financial aspects, including budgeting, cost tracking, and cost control.
Implement and monitor cost-effective solutions for infrastructure and services, optimizing cloud resources.
Work closely with finance and procurement teams to ensure efficient allocation of technology-related budgets.
Implement cost allocation models to attribute technology expenses accurately.
Manage periodic reporting on the FIn Ops progress to the management
Skills
The ideal candidate for this position will have the following:
Deep knowledge of Linux systems
The candidate must have strong skills in Operating Systems(Linux/Ubuntu/Debian), understand their way around a UNIX shell and believe that where there is a shell, there is a way.
Good computer network skills – He/She/They understands how networks work, the OSI model and protocols including TCP/IP, UDP, ICMP, HTTP(s), DNS, DHCP, SMTP etc.
Virtualization and Containerization technologies
A deep understanding of Docker, LXD or Containerd runtimes.
Strong experience in running production applications on Kubernetes.
Comprehensive Programming Skills
Strong understanding of version control systems i.e. Git + GitLab/GitHub/Bitbucket.
Experience using popular CI/CD pipeline tools – GitLab CI/CD, Github Actions, CircleCI etc.
Strong knowledge of DBMS mainly but not limited to PostgreSQL is a must.
Cloud-first Mindset
Proficient in Cloud computing, specifically but not limited to Google Cloud Platform and Amazon Web Services. Most of our applications are served from the cloud, therefore it is important to understand how the cloud works including products like GCE/EC2, Cloud Run/EBS, Cloud Functions/Lambda, GKE/EKS, S3/GCS, PubSub/SQS etc.
Automation Mastery
To avoid the hustle of manual tasks, it is a MUST to have an automation mindset. The main automation tool we utilize is Ansible, therefore it is super important to have strong knowledge in writing/modifying and running playbooks written in Ansible.
Must possess proficiency in infrastructure as code tools including Terraform and/or Pulumi, CloudFormation etc.
Proficiency in Kubernetes automation tools e.g. Helm v3 (mostly), Kustomize etc is also required.
Coding Proficiency
We are hiring for a person who will come in to maintain systems written with a combination of tools/libraries and programming languages, so it is important to know at least one programming language in our stack and have at least knowledge of the structure of the other languages that we use.
The backend stack is mainly written in Python(Django as the main framework, a background in any other Python framework is okay) and Golang.
Our APIs are implemented in mostly REST but newer apps utilize GraphQL. These APIs are usually deployed behind NGINX reverse proxies, except for the Go services that may be exposed directly.
Observability
We need someone with the ability to collect, analyze, and gain insights from data generated by software and infrastructure to ensure system reliability and performance. This skill includes data instrumentation, monitoring, diagnostics, automation, collaboration, and a commitment to continuous improvement. It’s about understanding and improving what’s happening within a system in real-time to proactively address issues and enhance overall system health.
Must have experience in running and integrating applications with observability tools such as Grafana, Prometheus, TICK stack, Google Cloud Monitoring/AWS CloudWatch, OpenTelemetry etc.
Detective Skills
We need someone who can detect, analyze, debug and follow up on issues end to end along with the effort to enhance the performance of our applications. They should be able to use either existing tools and techniques to debug and resolve issues and write up RCAs on them, which includes our monitoring stack plus Sentry and other monitoring tools.
Understand the full software stack – and go beyond
It is important to understand the whole stack in terms of how our apps are developed, deployed and maintained to be faster in reproducing and debugging errors and doing the necessary steps to resolve them. Therefore they should not be limited in terms of their knowledge, not a must at the beginning to know everything but it is important to have the will to learn.
go to method of application »
Use the link(s) below to apply on company website.
Apply via :