ENGINEERINGRemote·Full-time
Site Reliability Engineer
Keep our cloud and edge infrastructure running — from data ingest pipelines to model training clusters to field-deployed devices.
What you'll do
- Manage Kubernetes clusters and CI/CD pipelines
- Build observability and alerting for hybrid cloud/edge infrastructure
- Automate provisioning and scaling of GPU training environments
- Develop disaster recovery and incident response procedures
What we look for
- 3+ years in SRE, DevOps, or infrastructure engineering
- Strong Kubernetes, Terraform, and cloud (GCP/AWS) experience
- Scripting proficiency (Python, Bash, Go)
- Experience with monitoring tools (Prometheus, Grafana, Datadog)
Interested in this role?
Leave your details and we'll get back to you.