/ model training & fine-tuning

Confidently run large-scale training or fine-tuning on GPU clusters across clouds and on-premise

Union’s scalable architecture, flexible framework support, and robust workflow orchestration ensure reproducibility, efficiency, and collaboration in machine learning development. Train small or complex models, including GPU-accelerated deep learning, hyperparameter optimization, etc.

Balance performance vs. cost efficiency for training and fine-tuning

Train on a single GPU, multiple GPUs on a single node, or scale across multiple nodes. Leverage heterogeneous clusters with various accelerators, including Nvidia GPUs, Google TPUs, and AWS silicon. Utilize fractional GPUs for non-intensive tasks and take advantage of GPUs on spot nodes. All training infrastructure is shared and ephemeral, scaling to zero upon completion.

Orchestrate the entire training lifecycle on a unified platform

Training a model is often multiple steps, from data processing evaluation to validation. Orchestrate the entire training lifecycle on a unified platform. Fine-tune models reactively based on data arrival or automate downstream predictions during a successful training run. Catalog all models and access versions of models through a unified model registry.

Work seamlessly with your preferred languages, ML frameworks, and libraries

Leverage Union’s expansive support for ML frameworks, libraries, and Agent framework with pre-built components best suited for your needs. Run distributed training using PyTorch and TensorFlow by defining training tasks as containerized functions.

Build reliable, interpretable, and trustworthy models collaboratively across teams

Track end-to-end data and model lineage between workflows and teams using Artifacts, a registry for models and data. Trace any model predictions to the specific dataset used for model training. Immutable executions help you confidently reproduce and verify results to provide transparency and interpretability of models.

Testimonials

“Cradle addressed its data provenance requirements by leveraging key Flyte functionality. Since everything is versioned in Flyte, it is possible to trace what code and image produced which outputs from which inputs.”

E
Eli Bixby
Co-Founder & ML Lead at Cradle

Resources

Flyte and Weights & Biases Integration 
Thomas Fan
Thomas Fan
August 19, 2024
Machine Learning
AI Orchestration

Flyte and Weights & Biases Integration 

With Flyte’s latest plugin for Weights & Biases, you can now effectively run Machine Learning or AI workflows on Union and integrate with Weights & Biases capabilities.
Read the story
Fine-tune Llama 2 with Limited Resources
Niels Bantilan
Niels Bantilan
August 30, 2023
LLMs
Model Training

Fine-tune Llama 2 with Limited Resources

Do more with less: Refine the 70 billion parameter Llama 2 model on your dataset with a bunch of T4s
Read the story
Fine-Tuning Insights: Lessons from Experimenting with RedPajama Large Language Model on Flyte Slack Data
Samhita Alla
Samhita Alla
July 4, 2023
LLMs
Model Training

Fine-Tuning Insights: Lessons from Experimenting with RedPajama Large Language Model on Flyte Slack Data

Large language models (LLMs) have taken the world by storm, revolutionizing our understanding and generation of human-like text.
Read the story
The ML Doctor Says: Don’t Build Fancy Models Before You Set a Simple Baseline
Niels Bantilan
Niels Bantilan
February 1, 2023
Machine Learning

The ML Doctor Says: Don’t Build Fancy Models Before You Set a Simple Baseline

So you’ve taken a few online courses in machine learning and landed a data scientist role in your first industry job.
Read the story