201 Condition Monitoring jobs in Kenya

Remote Site Reliability Engineer

80201 Nairobi, Nairobi KES320000 Annually WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a dedicated and skilled Remote Site Reliability Engineer to ensure the optimal performance and availability of our distributed systems. This is a critical, fully remote position. You will be responsible for designing, implementing, and maintaining robust infrastructure, automating operational tasks, and proactively identifying and resolving potential system issues. Your role will involve monitoring system health, performance metrics, and alerting, as well as developing and executing disaster recovery plans. You’ll collaborate closely with development teams to ensure scalability, reliability, and efficiency throughout the software development lifecycle. Key responsibilities include managing cloud infrastructure, containerization technologies (like Docker and Kubernetes), CI/CD pipelines, and infrastructure-as-code tools. You will need to possess strong scripting skills (e.g., Python, Bash) and a deep understanding of networking principles and security best practices. This role requires a proactive mindset, excellent analytical and problem-solving capabilities, and the ability to work effectively in a collaborative, remote team environment. The ideal candidate is passionate about system stability, performance optimization, and automation. You will play a key role in maintaining our high standards of service availability and user experience. We value individuals who can think critically, adapt quickly to new technologies, and contribute to a culture of continuous improvement. This is an exciting opportunity to work on cutting-edge technologies and make a tangible impact on our operational excellence from your home office.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

00206 Gathiruini KES550000 Annually WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client, a rapidly expanding technology firm known for its innovative software solutions, is seeking a highly motivated and skilled Site Reliability Engineer (SRE) to join their fully remote engineering team. This role is critical to ensuring the scalability, availability, and performance of our client's production systems and services. You will be responsible for building and maintaining robust, automated infrastructure, diagnosing and resolving complex system issues, and driving improvements in reliability and efficiency. If you are passionate about systems, automation, and building resilient infrastructure, this is the perfect opportunity for you to contribute to a leading tech company from anywhere in the world.

Key Responsibilities:
  • Design, build, and maintain scalable and highly available infrastructure on cloud platforms (e.g., AWS, Azure, GCP).
  • Develop and implement automation tools and scripts for deployment, monitoring, and operational tasks.
  • Monitor system performance, identify bottlenecks, and proactively resolve issues to ensure optimal uptime and reliability.
  • Participate in incident response, conducting post-mortems to identify root causes and implement preventative measures.
  • Collaborate with development teams to ensure new features and services are designed for reliability and scalability.
  • Implement and manage containerization technologies such as Docker and Kubernetes.
  • Develop and maintain infrastructure as code (IaC) using tools like Terraform or Ansible.
  • Define and track key service level objectives (SLOs) and service level indicators (SLIs).
  • Troubleshoot complex system failures and performance degradations.
  • Contribute to the continuous improvement of our CI/CD pipelines and development processes.

The ideal candidate will possess a Bachelor's degree in Computer Science, Engineering, or a related field, along with 4+ years of experience in SRE, DevOps, or a similar role. Proficiency in at least one major cloud provider (AWS, Azure, GCP) and strong scripting skills (Python, Bash, Go) are essential. Experience with container orchestration (Kubernetes) and infrastructure automation tools is required. A deep understanding of networking concepts, operating systems (Linux), and distributed systems is crucial. Excellent problem-solving, debugging, and communication skills are necessary for this remote position. We are looking for a proactive individual who can work independently, manage their time effectively, and contribute positively to a distributed team culture. Join our client's cutting-edge team and play a vital role in shaping the future of their technology.
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

60200 Meru , Eastern KES120000 Annually WhatJobs

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our fully remote team. This role is pivotal in ensuring the scalability, availability, and performance of our mission-critical systems. As an SRE, you will be responsible for designing, building, and maintaining robust infrastructure, automating operational tasks, and proactively identifying and mitigating potential issues. You will work closely with development teams to implement best practices in software engineering for systems operations, focusing on reliability, observability, and efficiency. Your core duties will include developing and maintaining infrastructure-as-code (IaC) using tools like Terraform or CloudFormation, managing cloud environments (AWS, Azure, GCP), implementing CI/CD pipelines, and monitoring system health using tools like Prometheus, Grafana, and Datadog. You will also be involved in incident response, root cause analysis, and implementing preventative measures. This position requires a deep understanding of distributed systems, networking concepts, and containerization technologies such as Docker and Kubernetes. The ideal candidate will have a strong background in software development and a passion for systems engineering. You should be adept at troubleshooting complex problems in high-availability environments and possess excellent scripting skills (Python, Bash, Go). This role offers the opportunity to shape the future of our infrastructure and contribute to a culture of engineering excellence. Join us from anywhere in Kenya as we build and scale cutting-edge solutions. The position is focused on our operations supporting the region of Meru, Meru, KE , but the work itself is entirely remote.
Responsibilities:
  • Design, build, and maintain scalable and reliable infrastructure.
  • Automate operational tasks and deployments using CI/CD pipelines.
  • Implement and manage monitoring, logging, and alerting systems.
  • Troubleshoot and resolve system outages and performance issues.
  • Collaborate with software engineers to improve system design and operability.
  • Develop and maintain infrastructure-as-code for cloud environments.
  • Participate in on-call rotations for incident management.
  • Conduct performance analysis and capacity planning.
  • Contribute to architectural decisions to ensure system resilience.
  • Document system configurations and operational procedures.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or System Administration.
  • Proficiency in cloud platforms (AWS, Azure, or GCP).
  • Strong experience with containerization technologies (Docker, Kubernetes).
  • Expertise in infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet).
  • Solid understanding of networking protocols and concepts.
  • Proficiency in scripting languages (Python, Bash, Go).
  • Experience with monitoring and logging tools (Prometheus, Grafana, ELK Stack).
  • Excellent problem-solving and debugging skills.
  • Strong communication and collaboration abilities in a remote team environment.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

20100 Mwembe KES240000 Annually WhatJobs

Posted 4 days ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client, a fast-growing SaaS company, is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join their technical operations team. This is a fully remote position, offering the flexibility to contribute to maintaining and improving the reliability, scalability, and performance of our cloud-based platforms from anywhere. You will be instrumental in ensuring the availability and efficiency of our services by implementing robust automation, proactive monitoring, and effective incident response. Your responsibilities will include automating operational tasks, designing resilient systems, and collaborating with development teams to build infrastructure as code. You will play a key role in optimizing our infrastructure and ensuring a seamless user experience.

Key responsibilities:
  • Ensuring the reliability, availability, scalability, and performance of production systems.
  • Developing and implementing automated solutions for infrastructure provisioning, deployment, and management.
  • Designing and implementing monitoring, alerting, and logging systems.
  • Participating in on-call rotations and responding to incidents to resolve issues quickly and efficiently.
  • Collaborating with software development teams to improve system design and reliability.
  • Performing root cause analysis for production incidents and implementing preventative measures.
  • Managing cloud infrastructure (e.g., AWS, Azure, GCP) and ensuring its stability.
  • Writing and maintaining infrastructure as code (IaC) using tools like Terraform or Ansible.
  • Optimizing system performance and reducing operational costs.
  • Developing and documenting operational procedures and best practices.
  • Conducting capacity planning and performance testing.
  • Participating in security reviews and implementing security best practices.

The ideal candidate will have a Bachelor's degree in Computer Science, Engineering, or a related field, with a strong understanding of distributed systems and cloud computing. A minimum of 5 years of experience in a Site Reliability Engineering, DevOps, or Systems Engineering role is required. Proficiency in scripting languages (e.g., Python, Bash), containerization technologies (e.g., Docker, Kubernetes), and cloud platforms (e.g., AWS, Azure, GCP) is essential. Experience with CI/CD pipelines and monitoring tools (e.g., Prometheus, Grafana, ELK stack) is highly desirable. Excellent problem-solving, analytical, and communication skills are mandatory. If you are a proactive engineer dedicated to building and maintaining highly reliable systems in a remote setting, we encourage you to apply.
This advertiser has chosen not to accept applicants from your region.

Remote Site Reliability Engineer

01000 Makongeni KES120000 Annually WhatJobs

Posted 4 days ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is actively seeking a skilled Remote Site Reliability Engineer (SRE) to bolster their infrastructure and operations team. This is a fully remote role, offering flexibility and the opportunity to contribute to a cutting-edge technology environment from any location. As an SRE, you will focus on ensuring the reliability, scalability, and performance of our client's production systems and services. You will work on automating operational tasks, improving system observability, and driving incident response.

Key Responsibilities:
  • Design, implement, and manage scalable and highly available distributed systems.
  • Develop and maintain automation tools for deployment, monitoring, and operational tasks.
  • Monitor system performance, identify and resolve production issues, and minimize downtime.
  • Implement and manage logging, monitoring, and alerting solutions.
  • Participate in on-call rotations for incident response and post-mortem analysis.
  • Collaborate with development teams to ensure reliability and performance are considered throughout the software development lifecycle.
  • Develop and advocate for practices that improve system stability, reliability, and security.
  • Conduct root cause analysis for production incidents and implement preventative measures.
  • Manage and optimize cloud infrastructure (AWS, Azure, GCP).
  • Contribute to the design and architecture of new systems and services.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Proven experience (4+ years) in Site Reliability Engineering, DevOps, or a similar role.
  • Strong experience with at least one major cloud provider (AWS, Azure, GCP).
  • Proficiency in programming/scripting languages such as Python, Go, or Java.
  • Experience with containerization technologies like Docker and Kubernetes.
  • Solid understanding of operating systems (Linux/Unix), networking concepts, and distributed systems.
  • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Excellent problem-solving and debugging skills.
  • Strong communication and collaboration skills, especially in a remote team setting.
This position offers a dynamic and challenging remote work environment. If you are passionate about building and maintaining resilient systems and thrive in a collaborative, remote-first culture, this is the role for you.
This advertiser has chosen not to accept applicants from your region.

Remote Site Reliability Engineer (SRE)

12200 Njiru Village KES150000 Annually WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly skilled and motivated Remote Site Reliability Engineer (SRE) to ensure the stability, performance, and scalability of their complex infrastructure. This is a fully remote position, offering the flexibility to work from any location. As an SRE, you will play a critical role in designing, building, and operating our client's systems, focusing on automation, reliability, and efficiency. You will work closely with development and operations teams to identify and mitigate potential issues, implement robust monitoring solutions, and respond to incidents with a focus on minimizing downtime.

Key responsibilities include:
  • Designing, implementing, and maintaining highly available and scalable distributed systems.
  • Developing automation tools and scripts to streamline operational tasks, deployments, and monitoring.
  • Proactively identifying and resolving performance bottlenecks and system issues.
  • Implementing and managing comprehensive monitoring, alerting, and logging solutions.
  • Participating in on-call rotations to respond to system incidents and emergencies.
  • Conducting root cause analysis for system outages and implementing preventative measures.
  • Collaborating with software engineers to improve system design for reliability and operability.
  • Managing cloud infrastructure (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
  • Developing and maintaining CI/CD pipelines for seamless software delivery.
  • Ensuring security best practices are implemented across all systems.
  • Documenting system architecture, operational procedures, and incident response plans.
  • Evaluating and recommending new technologies and tools to enhance system reliability.
The ideal candidate possesses a strong background in system administration, software development, or DevOps, with a proven track record in SRE roles. Proficiency in cloud platforms, container orchestration, scripting languages (Python, Bash), and monitoring tools is essential. You should have a deep understanding of networking, operating systems, and distributed systems. Excellent problem-solving, debugging, and communication skills are required. If you are passionate about building resilient systems and thrive in a remote, collaborative environment, we encourage you to apply.
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer (SRE)

00100 Moiben KES700000 month WhatJobs

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client, a leading technology firm known for its robust and scalable cloud infrastructure, is seeking a highly skilled Senior Site Reliability Engineer (SRE) to join their team based in Garissa, Garissa, KE . This role is crucial for maintaining the high availability, performance, and scalability of our critical production systems. You will be instrumental in designing, building, and operating infrastructure that ensures system reliability and efficiency, employing automation and best practices across the board. Key responsibilities include developing and maintaining robust monitoring and alerting systems, automating deployment and operational tasks, and responding to incidents with rapid diagnosis and resolution. You will work closely with development teams to ensure that new features are designed with reliability and scalability in mind, participating in on-call rotations to support production systems. The ideal candidate will have a strong background in systems engineering, cloud computing (AWS, Azure, or GCP), and containerization technologies (Docker, Kubernetes). Proven experience with scripting languages (Python, Bash), infrastructure as code (Terraform, Ansible), and a deep understanding of distributed systems are essential. You should be adept at troubleshooting complex issues in production environments and possess a proactive approach to identifying and mitigating potential risks. This position offers a competitive salary, comprehensive benefits, and the opportunity to work with cutting-edge technologies in a challenging and rewarding environment. If you are a passionate SRE with a commitment to operational excellence and a desire to build highly reliable systems, we encourage you to apply.
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Condition monitoring Jobs in Kenya !

Remote Site Reliability Engineer - Cloud Infrastructure

01100 Abothuguchi West KES180000 Annually WhatJobs

Posted 2 days ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly skilled and motivated Remote Site Reliability Engineer (SRE) to join their infrastructure team. This is a fully remote role, allowing you to contribute to the stability, performance, and scalability of our cloud-based systems from anywhere. You will be responsible for designing, implementing, and maintaining reliable and robust infrastructure, automating operational tasks, and ensuring the availability and resilience of our services. The ideal candidate will have a deep understanding of cloud computing platforms (e.g., AWS, Azure, GCP), containerization technologies (e.g., Docker, Kubernetes), and infrastructure-as-code tools (e.g., Terraform, Ansible). Your responsibilities will include monitoring system performance, identifying and resolving incidents, capacity planning, and implementing robust disaster recovery and business continuity strategies. You will work closely with development teams to ensure that applications are designed for reliability and operability. This role demands excellent problem-solving skills, a proactive approach to system management, and a strong command of scripting and automation. You should be comfortable working in a fast-paced, agile environment and possess exceptional communication skills for effective remote collaboration. Experience with CI/CD pipelines, CI/CD tools, monitoring tools (e.g., Prometheus, Grafana), and distributed systems is essential. You will play a critical role in ensuring our platform's uptime, security, and performance, directly impacting user experience and business operations. This is an exciting opportunity to shape the future of our infrastructure and contribute to the success of a growing technology company. Your expertise will be crucial in maintaining the high availability and efficiency of our digital services, ensuring seamless operations for our global user base. The role emphasizes a proactive, 'you build it, you run it' philosophy for infrastructure components.

Key Responsibilities:
  • Design, implement, and manage scalable and highly available cloud infrastructure.
  • Automate infrastructure provisioning, configuration, and deployment using IaC tools.
  • Monitor system performance, identify bottlenecks, and implement optimizations.
  • Develop and maintain CI/CD pipelines for efficient software delivery.
  • Respond to and resolve production incidents, performing root cause analysis.
  • Implement and manage container orchestration platforms (e.g., Kubernetes).
  • Develop and execute disaster recovery and business continuity plans.
  • Collaborate with software engineering teams to ensure system reliability and operability.
  • Manage security configurations and ensure compliance with best practices.
  • Develop and maintain comprehensive documentation for infrastructure and processes.
  • Conduct capacity planning and performance testing.
  • Contribute to the continuous improvement of SRE practices and tooling.

Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Minimum of 4 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
  • Strong experience with cloud platforms such as AWS, Azure, or GCP.
  • Proficiency in containerization technologies (Docker) and orchestration (Kubernetes).
  • Hands-on experience with infrastructure-as-code tools (Terraform, Ansible).
  • Expertise in scripting languages (e.g., Python, Bash, Go).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Solid understanding of networking concepts and protocols.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills for remote work.
  • Experience with CI/CD tools (e.g., Jenkins, GitLab CI).
  • Familiarity with database management and scaling.
This advertiser has chosen not to accept applicants from your region.

Remote Senior Site Reliability Engineer (SRE)

30300 Bungoma, Western KES320000 Annually WhatJobs

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is a rapidly growing technology company seeking a highly skilled and experienced Remote Senior Site Reliability Engineer (SRE). This critical role is responsible for ensuring the availability, performance, scalability, and security of our production systems and services. As a fully remote position, you will play a pivotal part in maintaining and improving our infrastructure, applying software engineering principles to operations challenges. The ideal candidate will have a strong background in system administration, distributed systems, and cloud computing, with a deep understanding of automation and monitoring. You will collaborate with development teams to build reliable and scalable software, troubleshoot complex issues, and implement proactive solutions. Key responsibilities include:
  • Designing, building, and maintaining scalable and highly available production systems and infrastructure, primarily on cloud platforms (e.g., AWS, Azure, GCP).
  • Developing and implementing automation tools and scripts for deployment, monitoring, and incident response.
  • Monitoring system performance, identifying bottlenecks, and implementing optimizations to ensure reliability and efficiency.
  • Troubleshooting and resolving complex production issues, performing root cause analysis, and implementing preventative measures.
  • Collaborating with software development teams to integrate SRE principles into the software development lifecycle (SDLC).
  • Participating in on-call rotations to respond to critical system incidents.
  • Developing and maintaining comprehensive documentation for systems, processes, and runbooks.
  • Implementing and managing CI/CD pipelines to streamline software delivery.
  • Ensuring the security of systems and infrastructure through best practices and regular audits.
  • Mentoring junior engineers and sharing knowledge across the engineering team.

The successful candidate will possess a Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. A minimum of 5-7 years of experience in Site Reliability Engineering, DevOps, or a similar role is required. Proven experience with cloud platforms (AWS, Azure, GCP), containerization technologies (Docker, Kubernetes), and infrastructure as code (Terraform, Ansible) is essential. Strong programming skills in languages such as Python, Go, or Java are highly desirable. Excellent understanding of distributed systems, networking, and operating systems (Linux) is a must. Strong problem-solving, debugging, and analytical skills are required, along with effective communication and collaboration abilities. This is a fully remote position, demanding self-motivation, a proactive approach, and the ability to work effectively within a distributed team environment. If you are a passionate SRE looking to make a significant impact on the reliability and performance of cutting-edge technology systems, we encourage you to apply.
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Condition Monitoring Jobs