Industry: 
Autonomous Driving
Use Case: 
AI

Wayve Accelerates Autonomous Driving Innovation with Flyte’s Scalable Orchestration

Wayve is a leading end-to-end Embodied AI developer for assisted and autonomous driving. Flyte is the chosen orchestration engine at Wayve to run large-scale experiments and production pipelines on cloud compute. It powers workflows for data processing, offline labeling, generating embeddings, dataset materialization, model training, and inference with neural simulators (Ghost Gym). It orchestrates thousands of NVIDIA GPUs across large datasets, delivering experimental results much faster.

Wayve’s researchers find value from Flyte by abstracting away infrastructure and Kubernetes, allowing them to focus on research tasks. It enables faster iteration cycles by lowering reliance on engineering teams. Capabilities such as caching, versioning, map tasks, workflow observability, modularity from heterogeneous tasks, scalable workflow executions, etc. provide Wayve with significant value with increased productivity, improved cost efficiency, faster time to market, and a competitive advantage.

About Wayve 

Wayve is an innovative autonomous driving company focused on revolutionizing the future of mobility through end-to-end AI. Founded in 2017 and based in London, Wayve is pioneering a new approach to autonomous vehicles by leveraging cutting-edge AI and deep learning techniques to enable self-driving cars to learn and adapt to the complexities of real-world driving. Unlike traditional systems that rely heavily on hand-coded rules and extensive sensor arrays, Wayve’s technology emphasizes the use of vision-based learning and simple sensor setups to create a more scalable and adaptable autonomous driving solution. 

With a mission to make self-driving technology accessible and efficient for all, Wayve is positioned at the forefront of the assisted and autonomous driving revolution, pushing the boundaries of what’s possible in urban mobility.

Scaling Wayve’s Workflows through Rapid Iteration, Efficient Resource Management, and Massive Parallelism

Wayve started evaluating an orchestration solution to intelligently and scalablely launch thousands of data and ML workflows. After considering over a dozen solutions, Wayve’s team settled on Flyte due to its broad industry adoption, particularly with enterprises. Flyte’s Kubernetes underpinning supported Wavye’s technology stack and enabled the team to support multi-cluster deployments across regions. Despite being Kubernetes-based, the barrier was low for researchers to start realizing value. Meaningful error messages provide visibility into Kubernetes layer issues, including out of memory errors, etc.

Flyte’s flexibility and extensibility make it ideal for various use cases. Its API-based registration system and versioned entities enable users to iterate on the same workflow independently, which supports faster iteration, team collaboration, and increased model development velocity. Flyte’s caching capabilities help Wayve save on workflow run-time and compute costs while limiting recompute. A key technique that Wayve uses to improve its models is resimulation. By resimulating past data, the team can identify areas for improvement in training a new model. It resultings in rapid and continuous model improvement, which makes the model more robust.

The Wayve team also leverages Flyte to summarize driving scenes with embeddings generated from running Flyte workflows across large-scale datasets. This enables efficient similarity search, offline scenario classification, and dataset curation.

“The ability to massively parallelise computations using map tasks, and without having to depend on other frameworks accelerates our team’s development lifecycle. The researchers focus on the science rather than learning compute frameworks.” —Tom Newton, Software Engineer, Wayve

By simply taking a Python function and running massively scaled parallel copies of it, Wayve can run tens of thousands of map tasks simultaneously today, with capacity to increase further. This capability drives: accelerated processing (from scalable workflow executions), user productivity gains (due to simplified workflow management), and improved reliability (because of the inherent fault tolerance via automated retries and reruns of specific map tasks). 

Resiliency and high availability of the Flyte backend are critical, since it powers Wayve’s mission-critical use cases. The Wayve team has used its experience with operating Flyte at enterprise production-scale to make notable contributions to the Flyte project to improve its resilience, reliability, and performance.

Flyte is very flexible in defining dependencies, allowing for the isolation of dependencies and resources per task. The Wayve team can define a single heterogeneous workflow that includes different task types—such as a combination of Spark, Python, and raw container tasks—while isolating and managing the dependencies and compute requirements for each task. This modularity and reusability of tasks drives increased productivity. There are cost efficiency benefits, too, as each task only uses the resources specified. Improved dependency management at the task level reduces the risks of workflow failures from conflicts or resource exhaustion.

By using Flyte, Wayve has been able to see significant value and impact to its business. In one case, the team was able to build, deploy, and run a labeling pipeline in a few days—a project that, without Flyte, took the team weeks to complete. Additionally, the team was able to better track and respond to task failure statuses. The biggest impact was with neural simulators and Neural Radiance Fields (NeRFs), where scaling to the levels Wayve has been able to reach would not be possible without Flyte. As a result, the pipeline runs significantly faster compared to before using Flyte, with much greater reliability—all with a single click, as opposed to an involved, multi-stage kickoff process. Scaling to the levels Wayve has been able to reach would not be possible without Flyte's massive parallelism