Site Reliability Engineer

Full-Time
  • Full-Time
  • Anywhere

Website Quantum World Technologies Inc.

SRE Ops (Site Reliability Engineer – Operations)

Mandatory Skills: SRE, Python, NodeJS, DevOps, Server -Unix/Linux

Job Summary:We are looking for a talented and dedicated Site Reliability Engineer with a strong focus on operations to join our team. The SRE Ops role involves a deep understanding of application architecture, automation, and incident management to maintain and improve the reliability of our systems. The ideal candidate will be responsible for enhancing system performance, scalability, and availability while proactively addressing potential issues.

Key Responsibilities:


Incident Management:

Participate in on-call rotations to respond to incidents and outages.

Lead and coordinate incident response, including troubleshooting and resolution.

Monitoring and Alerting:


Implement and manage monitoring and alerting systems to detect performance issues and failures.

Analyze data to proactively address potential problems.

Release/Change Management:

Implement changes, updates, and improvements to systems while minimizing disruptions.


Follow best practices for change control and documentation.

Documentation and Knowledge Sharing:

Create and maintain documentation for operational procedures, configurations, and best practices.

Share knowledge and collaborate with team members to enhance collective expertise.


Automation and Tooling:

Develop and maintain automation scripts and tools.

Continuously improve and optimize operational processes.

1+ years of experience as a Site Reliability Engineer or 3+ years in a DevOps environment.


3+ years development experience Node JS and or python

3+ years of experience with each of the following – at least one tool per category:


• Cloud Infrastructure (Amazon Web Services (AWS) / Google Cloud Platform (GCP) / Azure / etc.)


• Bug tracking (JIRA / Bugzilla / YouTrack / etc.)



• Continuous Deployment (concourse / GitHub Actions / Spinnaker / etc.)

• CDK or Cloud formation


• Metrics & Tracing (Grafana / Prometheus /Dynatrace/ Jaeger / OpenTelemetry / OpenTracing / etc.)

• Monitoring platform or tool experience

Source
WhatJobs