Reporting to the Engineering Lead – Service Availability, the position holder will be tasked with monitoring & Observability and improving the operational aspects of all systems in scope within DIT. Drive automation and Dev-ops across the different domains. Foster service monitoring through proactive initiatives like AIOPs, machine learning among other available channels.
The role is fixed term contract (1 year).
Key Responsibilities:
Proactively building and implementing monitoring services, including end to end monitoring, scripting and automation, modern tooling and maintenance software.
Use of AI and Machine learning to perform log analysis and create predictive models that will assist in identifying potential failures.
Developing and executing automation scripts and maintenance jobs.
Developing automation around Tibco Middleware.
Onboarding DIT systems to the service monitoring tools (APMs).
Clearly document any monitoring gaps noted and collaborate with the relevant teams to ensure timely closure.
Performance of Applications error analysis and follow-up to ensure optimal customer experience.
Deployment of planned & operational changes on systems in scope.
Support all Digital squads to ensure new products are monitored.
Support in Zero touch Operations initiatives.
Job Requirements:
Bachelor’s Degree in either Computer Science or Information Technology, Electrical and communication engineering or Business Information Systems or in a relevant field in telecommunication.
At least 2 years’ experience in a busy telco or IT setup.
Knowledge on Tibco will be added advantage.
Domain knowledge in at least 3 of the following areas, Databases, Containerization, VAS, Integration, Virtualization, Cloud (AWS or Azure), Orchestration (Kubernetes), App development (Android / IOS)
Good understanding of the micro service architecture & appreciation of the traditional/classic SOA
Technical Knowledge of working on Dev-Ops
Ability to manage a team having leadership skills, ownership of issues been analytical and a problem solver
Being able to implement strict change management policy
Good understanding of back-end programming such us Python, Java – Springboot.
Knowledge on front end programming will be an added advantage.
Formulate SLAs for each level of incident and therefore implement a service management approach for all types of services rendered.
Be well versed with modern monitoring tools and systems with experience in APMs such as Dynatrace, AppDynamics, New Relic, Splunk etc.
Well versed with SRE concepts with respect to providing stable services to customers and adhering to availability KPIs ,Service Level Objectives, Service Level Indicators & conforming to target monthly error budget
Apply via :
safaricom.taleo.net