Senior Software Engineer, Compute Platform

Palo Alto, California

US$300000 - US$400000 per year

Full time

Ref: 004444_1777562615

About the Company

A fast-growing, venture-backed startup is building a next-generation AI compute platform focused on decentralized, high-performance infrastructure. The company is rethinking how organizations access and scale compute by integrating global data centers into a unified, serverless platform.

Their mission is to democratize access to AI compute and provide an end-to-end lifecycle solution-from raw data to deployed models-through a combination of platform infrastructure and forward-deployed engineering.

With a global footprint and early traction, the team is tackling challenges across multi-cloud orchestration, GPU scheduling, and enterprise-grade infrastructure, with a strong focus on security and compliance.


The Role

This is a high-impact infrastructure role focused on designing and scaling distributed systems that power AI/ML workloads at scale.

You'll work across:

  • Core platform architecture
  • Multi-cloud compute orchestration
  • Managed services development
  • Customer-facing deployments

This role requires a strong mix of systems engineering + product thinking, with exposure to both backend infrastructure and end-user experience.


What You'll Work On

Compute Platform & Multi-Cloud Architecture

  • Design abstraction layers across cloud providers (AWS, GCP, Azure, bare-metal)
  • Build systems that unify compute, storage, and networking across environments
  • Expand global compute capacity by integrating with cloud and data center providers
  • Architect reusable, composable infrastructure components

Managed Services & Platform Development

  • Own services end-to-end (design → deployment → monitoring)
  • Build orchestration systems for GPU workloads and container scheduling
  • Develop APIs and control planes for provisioning, scaling, and lifecycle management
  • Drive improvements in performance, reliability, and cost efficiency

Infrastructure & Platform Services

  • Build systems for billing, usage tracking, and cost attribution
  • Develop observability tooling (metrics, logging, tracing)
  • Establish engineering standards and best practices
  • Mentor engineers and contribute to system design decisions

What They're Looking For

Core Requirements

  • 4+ years building distributed systems, backend infrastructure, or cloud platforms
  • Strong experience with AWS, GCP, or Azure
  • Deep understanding of:
    • Compute (VMs, instances)
    • Storage (object, block, file systems)
    • Networking (VPCs, load balancers, security groups)
  • Experience with Kubernetes and container orchestration
  • Strong programming skills (Golang preferred; Python/Rust a plus)
  • Experience building APIs, control planes, or platform services
  • Familiarity with databases (Postgres, Redis, etc.) and messaging systems (Kafka, RabbitMQ)

Nice to Have

  • GPU orchestration or AI/ML infrastructure experience
  • HPC or cluster management (Kubernetes, Slurm)
  • Data engineering or large-scale ETL systems
  • Systems-level programming (low-level infra, operators, daemons)
  • ML platform engineering (training/inference pipelines)
  • Experience deploying into enterprise or on-prem environments

Oscar Associates Limited (US) is acting as an Employment Agency in relation to this vacancy.

Apply today.

Share job