SPi Labs ← All Roles

ENGINEERINGRemote·Full-time

Site Reliability Engineer

Keep our cloud and edge infrastructure running — from data ingest pipelines to model training clusters to field-deployed devices.

What you'll do

Manage Kubernetes clusters and CI/CD pipelines
Build observability and alerting for hybrid cloud/edge infrastructure
Automate provisioning and scaling of GPU training environments
Develop disaster recovery and incident response procedures

What we look for

3+ years in SRE, DevOps, or infrastructure engineering
Strong Kubernetes, Terraform, and cloud (GCP/AWS) experience
Scripting proficiency (Python, Bash, Go)
Experience with monitoring tools (Prometheus, Grafana, Datadog)

Interested in this role?

Leave your details and we'll get back to you.