As a Senior Site Reliability Engineer, youll play a key role in shaping our new Production Reliability domain. Youll drive reliability initiatives, lead cross-team projects, and make sure our SaaS platform stays robust, scalable, and efficient. This is a high-impact, hands-on role that demands technical expertise and a proactive approach.
As a Senior SRE, you will:
Design, build, and maintain scalable, fault-tolerant systems.
Define and enforce SLOs, SLIs, and SLAs and drive improvements based on real data.
Build automation and tooling to enhance observability, testing, and deployments.
Lead complex incident responses, including on-call rotations and postmortems.
Collaborate closely with engineering, product, and support teams to embed reliability into everything we do.
Mentor engineers and promote operational excellence across the organization.
Requirements:
Have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments.
Bring deep expertise in resilience engineering, monitoring, and building fault-tolerant systems.
Are hands-on with monitoring tools like Datadog, Dynatrace, Opensearch, Coralogix, or Sentry.
Are experienced with CI/CD tools like Jenkins or ArgoCD.
Are proficient with infrastructure-as-code tools like Terraform or Crossplane.
Have strong knowledge of Linux systems and networking fundamentals.
Have solid experience with cloud platforms (AWS preferred).
Are an advanced coder in Java (Python or Go is a plus).
Know Kubernetes and the broader CNCF ecosystem inside out.
Excel at debugging and root cause analysis.
Are fluent in Hebrew and English.
Bring a high sense of ownership and accountability to everything you do.
Have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments.
Bring deep expertise in resilience engineering, monitoring, and building fault-tolerant systems.
Are hands-on with monitoring tools like Datadog, Dynatrace, Opensearch, Coralogix, or Sentry.
Are experienced with CI/CD tools like Jenkins or ArgoCD.
Are proficient with infrastructure-as-code tools like Terraform or Crossplane.
Have strong knowledge of Linux systems and networking fundamentals.
Have solid experience with cloud platforms (AWS preferred).
Are an advanced coder in Java (Python or Go is a plus).
Know Kubernetes and the broader CNCF ecosystem inside out.
Excel at debugging and root cause analysis.
Are fluent in Hebrew and English.
Bring a high sense of ownership and accountability to everything you do.
This position is open to all candidates.