DeployScaleRunAny AI Workload
Write locally, execute remotely with full data lineage, caching, observability & reproducibility.
Powerful AI orchestration on Kubernetes
For developers tasked with managing AI, ML, and data workflows in production, the challenges extend well beyond orchestrating DAGs. Union addresses these complexities by providing a scalable MLOps platform, designed to reduce costs and foster unmatched collaboration among team members, all backed by a Kubernetes-powered infrastructure.
Union optimizes resources across teams and implements cost-effective strategies that can reduce expenses by up to 66%. Moreover, it’s engineered to fit within your own cloud ecosystem, ensuring a robust and tailored infrastructure that scales with your technical demands.
Flexibility, security, observability & cost-efficient engineering
Union is a fully-managed platform deployed in your VPC. Get built-in dashboards, live logging, and task-level resource monitoring, enabling users to identify resource bottlenecks and simplifying the debugging process, resulting in optimized infrastructure and faster experimentation.
Maintain data locality and existing cloud vendor pricing. Union is multi-account and multi-cloud ready, and is available using AWS and GCP credits.
Use fractional GPUs & target specific accelerators
Need for GPUs vary by workloads. Targeting specific accelerators with Union is as simple as adding an annotation on a function. Moreover, increase utilization and reduce cost by allocating multiple tasks on a single GPU, ensuring strict memory isolation.
Users of Union often need multiple GPU Pools for different use cases, ranging from training, fine tuning, batch inference and more. You are able to leverage Nvidia GPUs, Google TPUs, AWS Silicon and other accelerators to optimize performance, cost and availability.
Seamlessly adapt to diverse computing needs with an array of GPU Pools tailored for various applications, including training, LLM fine-tuning, batch inference, and beyond. Harness the power of Nvidia GPUs, Google TPUs, AWS Silicon, and other cutting-edge accelerators to maximize performance and minimize costs.
Declarative infrastructure
Use Union’s declarative infrastructure to express your requirements and leave the infrastructure provisioning, configuring, and scaling on us. Ray, Spark, Dask, distributed training all through a single platform!
Track lineage & build event driven workflows
Automatically track end-to-end lineage between workflows and teams with Artifacts, a data catalog and model registry built on top of workflow inputs and outputs.
Seamlessly automate downstream workflows (such as model training) in response to the completion of upstream workflows (such as data processing) with Triggers.
Accelerated datasets, faster executions
Dramatically boost the performance of big file reads with Accelerated Datasets, which can reduce time to completion for some workflows by more than 90%.
The core engine at Union has been fine tuned and optimized for faster executions. Users experience drastically improved performance of certain workflows by up to 95%.
The Union orchestration partner network
Available now on the AWS Marketplace
Available now on the GCP Marketplace
Member of the Nvidia Inception Program
Purpose-built for lineage-aware pipeline orchestration
Bring your own Airflow code (BYOAC) and take advantage of modern AI orchestration features—out of the box! Get full reproducibility, audibility, experiment tracking, cross-team task sharing, compile-time error checking, and automatic artifact capture.
The best teams choose Union & Flyte
Across Data, ML, and AI, Flyte has established a stellar reputation as the most scalable AI orchestrator. It manages and executes workflows with over 10,000 CPUs and tens of thousands of pipelines, all powered by Python code. Union brings the powerful Flyte platform to your team in a managed environment, so you don’t have to set it up. Discover why the Flyte-powered Union is a game-changer
Faster time-to-market
In today’s fast-paced business environment, the ability to quickly develop and deploy machine learning models can be the difference between success and failure.
Union helps businesses accelerate their ML projects by automating many of the processes involved in model development and deployment, reducing the time and effort required to get models into production.
Scalable ML workflows
Scaling machine learning efforts can be challenging due to the need for specialized infrastructure, in-house expertise in distributed systems management, and tools to handle large-scale data processing and model training.
Union enables reproducibility, observability at the workflow, task, and data level, and provides plugins for model deployment and distributed model training tools and frameworks.
Reduce ML technical debt
Without standardized operations and processes in place, many teams struggle to promote models to production resulting in sunk costs and wasted compute resources.
Union enables more efficient and accurate workflows through automated validation and optimization throughout the development and deployment process.
Integrate with existing tooling
Whether you are working with ML frameworks like TensorFlow and PyTorch, or using tools like Jupyter notebooks and Apache Spark, Union is designed with an extensible plugin system that spans both data science and infrastructure stacks.
This allows users to leverage the power of a managed platform without disrupting existing processes.
Globally trusted & tested
Join our developer community
“We're mainly using Flyte™ because of its cloud native capabilities. We do everything in the cloud, and we also don't want to be limited to a single cloud provider. So having the ability to run everything through Kubernetes is amazing for us.”
“The multi-tenancy that Flyte™ provides is obviously important in regulated spaces where you need to separate users and resources and things like amongst each other within the same organization.”
“During our evaluation stage, we did some stress tests to understand whether Flyte™ can satisfy our requirements, and it provided us with a good result.”
“With Flyte™, we want to give the power back to biologists. We want to stand up something that they can play around with different parameters for their models because not every … parameter is fixed. We want to make sure we are giving them the power to run the analyses.”
“You can say, ‘Give me imputation’ and [Flyte™ will] launch 40 spot instances that are cheaper than your on-demand instance that you're using for your notebook and return the results back in memory.”
“Our provisions time is at least two times faster on average. The model execution time, we have three times faster, and the cost, which we can actually probably further optimize with minimal configuration and optimization, is at least three times cheaper. And so over time, we’re expecting to save even more.”
“Given the scale at which some of these tasks run, compute can get really expensive. So being able to add an interruptible argument to the task decorator for certain tasks has been really useful to cut costs.”
“We got over 66% reduction in orchestration code when we moved to Flyte™ — a huge win!”
“We’re going to have 10,000-plus CPUs that we plan to use every day to process the raw data. There’ll be 30 different targets approximately that we’re collecting data on every day. That’s about 200 GB of raw data and probably 2 TB or so on the output — a lot of data process. We’re leaning heavily on Flyte to make that happen.”