JOB DESCRIPTION
Reporting to the Engineering Lead – Service Availability, the position holder will be tasked with monitoring & Observability and improving the operational aspects of all systems in scope within DIT. Drive automation and Dev-ops across the different domains. Foster service monitoring through proactive initiatives like AIOPs, machine learning among other available channels.
RESPONSIBILITIES
Proactively building and implementing monitoring services, including end to end monitoring, scripting and automation, modern tooling and maintenance software.
Use of AI and Machine learning to perform log analysis and create predictive models that will assist in identifying potential failures.
Developing and executing automation scripts and maintenance jobs.
Developing automation around monitoring.
Onboarding DIT systems to the service monitoring tools (APMs like ELK).
Clearly document any monitoring gaps noted and collaborate with the relevant teams to ensure timely closure.
Performance of Applications error analysis and follow-up to ensure optimal customer experience.
Deployment of planned & operational changes on systems in scope.
Support all Digital squads to ensure new products are monitored.
Support in Zero touch Operations initiatives.
Support in development of collectors and agents
QUALIFICATIONS
Bachelor’s Degree in either Computer Science or Information Technology, Electrical and communication engineering or Business Information Systems or in a relevant field in telecommunication.
Domain knowledge in at least 2 of the following areas , Sysadmin especially Linux, Orchestration (Kubernetes), Linux Kernel, Open telemetry.
Good understanding of back-end programming such us Python & RUST
Technical understanding of SRE concepts & DevOps Practices with respect to providing stable services to customers and adhering to availability KPIs, Service Level Objectives, Service Level Indicators & conforming to target monthly error budget.
Be well versed with one or more modern monitoring tools such as ELK, Prometheus, Dynatrace, AppDynamics, New Relic, Splunk etc.
Good understanding of the micro service architecture & appreciation of the traditional/classic SOA
Ability to manage a team having leadership skills, ownership of issues been analytical and a problem solver.
Being able to implement strict change management policy.
Conversant with agile ways of working.
go to method of application »
Use the link(s) below to apply on company website.
Apply via :