SPi Labs
← All Roles
ENGINEERINGRemote·Full-time

Site Reliability Engineer

Keep our cloud and edge infrastructure running — from data ingest pipelines to model training clusters to field-deployed devices.

What you'll do

  • Manage Kubernetes clusters and CI/CD pipelines
  • Build observability and alerting for hybrid cloud/edge infrastructure
  • Automate provisioning and scaling of GPU training environments
  • Develop disaster recovery and incident response procedures

What we look for

  • 3+ years in SRE, DevOps, or infrastructure engineering
  • Strong Kubernetes, Terraform, and cloud (GCP/AWS) experience
  • Scripting proficiency (Python, Bash, Go)
  • Experience with monitoring tools (Prometheus, Grafana, Datadog)

Interested in this role?

Leave your details and we'll get back to you.