201 Condition Monitoring jobs in Kenya
Remote Site Reliability Engineer
Posted today
Job Viewed
Job Description
Site Reliability Engineer (SRE)
Posted today
Job Viewed
Job Description
Key Responsibilities:
- Design, build, and maintain scalable and highly available infrastructure on cloud platforms (e.g., AWS, Azure, GCP).
- Develop and implement automation tools and scripts for deployment, monitoring, and operational tasks.
- Monitor system performance, identify bottlenecks, and proactively resolve issues to ensure optimal uptime and reliability.
- Participate in incident response, conducting post-mortems to identify root causes and implement preventative measures.
- Collaborate with development teams to ensure new features and services are designed for reliability and scalability.
- Implement and manage containerization technologies such as Docker and Kubernetes.
- Develop and maintain infrastructure as code (IaC) using tools like Terraform or Ansible.
- Define and track key service level objectives (SLOs) and service level indicators (SLIs).
- Troubleshoot complex system failures and performance degradations.
- Contribute to the continuous improvement of our CI/CD pipelines and development processes.
The ideal candidate will possess a Bachelor's degree in Computer Science, Engineering, or a related field, along with 4+ years of experience in SRE, DevOps, or a similar role. Proficiency in at least one major cloud provider (AWS, Azure, GCP) and strong scripting skills (Python, Bash, Go) are essential. Experience with container orchestration (Kubernetes) and infrastructure automation tools is required. A deep understanding of networking concepts, operating systems (Linux), and distributed systems is crucial. Excellent problem-solving, debugging, and communication skills are necessary for this remote position. We are looking for a proactive individual who can work independently, manage their time effectively, and contribute positively to a distributed team culture. Join our client's cutting-edge team and play a vital role in shaping the future of their technology.
Senior Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
Responsibilities:
- Design, build, and maintain scalable and reliable infrastructure.
- Automate operational tasks and deployments using CI/CD pipelines.
- Implement and manage monitoring, logging, and alerting systems.
- Troubleshoot and resolve system outages and performance issues.
- Collaborate with software engineers to improve system design and operability.
- Develop and maintain infrastructure-as-code for cloud environments.
- Participate in on-call rotations for incident management.
- Conduct performance analysis and capacity planning.
- Contribute to architectural decisions to ensure system resilience.
- Document system configurations and operational procedures.
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or System Administration.
- Proficiency in cloud platforms (AWS, Azure, or GCP).
- Strong experience with containerization technologies (Docker, Kubernetes).
- Expertise in infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet).
- Solid understanding of networking protocols and concepts.
- Proficiency in scripting languages (Python, Bash, Go).
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK Stack).
- Excellent problem-solving and debugging skills.
- Strong communication and collaboration abilities in a remote team environment.
Site Reliability Engineer (SRE)
Posted 4 days ago
Job Viewed
Job Description
Key responsibilities:
- Ensuring the reliability, availability, scalability, and performance of production systems.
- Developing and implementing automated solutions for infrastructure provisioning, deployment, and management.
- Designing and implementing monitoring, alerting, and logging systems.
- Participating in on-call rotations and responding to incidents to resolve issues quickly and efficiently.
- Collaborating with software development teams to improve system design and reliability.
- Performing root cause analysis for production incidents and implementing preventative measures.
- Managing cloud infrastructure (e.g., AWS, Azure, GCP) and ensuring its stability.
- Writing and maintaining infrastructure as code (IaC) using tools like Terraform or Ansible.
- Optimizing system performance and reducing operational costs.
- Developing and documenting operational procedures and best practices.
- Conducting capacity planning and performance testing.
- Participating in security reviews and implementing security best practices.
The ideal candidate will have a Bachelor's degree in Computer Science, Engineering, or a related field, with a strong understanding of distributed systems and cloud computing. A minimum of 5 years of experience in a Site Reliability Engineering, DevOps, or Systems Engineering role is required. Proficiency in scripting languages (e.g., Python, Bash), containerization technologies (e.g., Docker, Kubernetes), and cloud platforms (e.g., AWS, Azure, GCP) is essential. Experience with CI/CD pipelines and monitoring tools (e.g., Prometheus, Grafana, ELK stack) is highly desirable. Excellent problem-solving, analytical, and communication skills are mandatory. If you are a proactive engineer dedicated to building and maintaining highly reliable systems in a remote setting, we encourage you to apply.
Remote Site Reliability Engineer
Posted 4 days ago
Job Viewed
Job Description
Key Responsibilities:
- Design, implement, and manage scalable and highly available distributed systems.
- Develop and maintain automation tools for deployment, monitoring, and operational tasks.
- Monitor system performance, identify and resolve production issues, and minimize downtime.
- Implement and manage logging, monitoring, and alerting solutions.
- Participate in on-call rotations for incident response and post-mortem analysis.
- Collaborate with development teams to ensure reliability and performance are considered throughout the software development lifecycle.
- Develop and advocate for practices that improve system stability, reliability, and security.
- Conduct root cause analysis for production incidents and implement preventative measures.
- Manage and optimize cloud infrastructure (AWS, Azure, GCP).
- Contribute to the design and architecture of new systems and services.
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- Proven experience (4+ years) in Site Reliability Engineering, DevOps, or a similar role.
- Strong experience with at least one major cloud provider (AWS, Azure, GCP).
- Proficiency in programming/scripting languages such as Python, Go, or Java.
- Experience with containerization technologies like Docker and Kubernetes.
- Solid understanding of operating systems (Linux/Unix), networking concepts, and distributed systems.
- Experience with infrastructure-as-code tools (e.g., Terraform, Ansible).
- Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Excellent problem-solving and debugging skills.
- Strong communication and collaboration skills, especially in a remote team setting.
Remote Site Reliability Engineer (SRE)
Posted today
Job Viewed
Job Description
Key responsibilities include:
- Designing, implementing, and maintaining highly available and scalable distributed systems.
- Developing automation tools and scripts to streamline operational tasks, deployments, and monitoring.
- Proactively identifying and resolving performance bottlenecks and system issues.
- Implementing and managing comprehensive monitoring, alerting, and logging solutions.
- Participating in on-call rotations to respond to system incidents and emergencies.
- Conducting root cause analysis for system outages and implementing preventative measures.
- Collaborating with software engineers to improve system design for reliability and operability.
- Managing cloud infrastructure (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Developing and maintaining CI/CD pipelines for seamless software delivery.
- Ensuring security best practices are implemented across all systems.
- Documenting system architecture, operational procedures, and incident response plans.
- Evaluating and recommending new technologies and tools to enhance system reliability.
Senior Site Reliability Engineer (SRE)
Posted 3 days ago
Job Viewed
Job Description
Be The First To Know
About the latest Condition monitoring Jobs in Kenya !
Remote Site Reliability Engineer - Cloud Infrastructure
Posted 2 days ago
Job Viewed
Job Description
Key Responsibilities:
- Design, implement, and manage scalable and highly available cloud infrastructure.
- Automate infrastructure provisioning, configuration, and deployment using IaC tools.
- Monitor system performance, identify bottlenecks, and implement optimizations.
- Develop and maintain CI/CD pipelines for efficient software delivery.
- Respond to and resolve production incidents, performing root cause analysis.
- Implement and manage container orchestration platforms (e.g., Kubernetes).
- Develop and execute disaster recovery and business continuity plans.
- Collaborate with software engineering teams to ensure system reliability and operability.
- Manage security configurations and ensure compliance with best practices.
- Develop and maintain comprehensive documentation for infrastructure and processes.
- Conduct capacity planning and performance testing.
- Contribute to the continuous improvement of SRE practices and tooling.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Minimum of 4 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
- Strong experience with cloud platforms such as AWS, Azure, or GCP.
- Proficiency in containerization technologies (Docker) and orchestration (Kubernetes).
- Hands-on experience with infrastructure-as-code tools (Terraform, Ansible).
- Expertise in scripting languages (e.g., Python, Bash, Go).
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Solid understanding of networking concepts and protocols.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills for remote work.
- Experience with CI/CD tools (e.g., Jenkins, GitLab CI).
- Familiarity with database management and scaling.
Remote Senior Site Reliability Engineer (SRE)
Posted 3 days ago
Job Viewed
Job Description
- Designing, building, and maintaining scalable and highly available production systems and infrastructure, primarily on cloud platforms (e.g., AWS, Azure, GCP).
- Developing and implementing automation tools and scripts for deployment, monitoring, and incident response.
- Monitoring system performance, identifying bottlenecks, and implementing optimizations to ensure reliability and efficiency.
- Troubleshooting and resolving complex production issues, performing root cause analysis, and implementing preventative measures.
- Collaborating with software development teams to integrate SRE principles into the software development lifecycle (SDLC).
- Participating in on-call rotations to respond to critical system incidents.
- Developing and maintaining comprehensive documentation for systems, processes, and runbooks.
- Implementing and managing CI/CD pipelines to streamline software delivery.
- Ensuring the security of systems and infrastructure through best practices and regular audits.
- Mentoring junior engineers and sharing knowledge across the engineering team.
The successful candidate will possess a Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. A minimum of 5-7 years of experience in Site Reliability Engineering, DevOps, or a similar role is required. Proven experience with cloud platforms (AWS, Azure, GCP), containerization technologies (Docker, Kubernetes), and infrastructure as code (Terraform, Ansible) is essential. Strong programming skills in languages such as Python, Go, or Java are highly desirable. Excellent understanding of distributed systems, networking, and operating systems (Linux) is a must. Strong problem-solving, debugging, and analytical skills are required, along with effective communication and collaboration abilities. This is a fully remote position, demanding self-motivation, a proactive approach, and the ability to work effectively within a distributed team environment. If you are a passionate SRE looking to make a significant impact on the reliability and performance of cutting-edge technology systems, we encourage you to apply.