ENGINEERINGRemote·Full-time
ML Engineer, On-device Inference
Shrink frontier models to fit on edge hardware without sacrificing the accuracy our customers depend on.
What you'll do
- Quantize, prune, and distill large perception models for edge deployment
- Benchmark inference across TensorRT, ONNX Runtime, and custom runtimes
- Build automated model optimization and profiling pipelines
- Collaborate with research to design architecture-aware efficient models
What we look for
- 3+ years in ML model optimization or deployment
- Hands-on experience with TensorRT, ONNX, or TFLite
- Strong Python and C++ skills
- Understanding of neural network quantization and hardware accelerators
Interested in this role?
Leave your details and we'll get back to you.