/ model training & fine-tuning

Confidently run large-scale training or fine-tuning on GPU clusters across clouds and on-premise

Union’s scalable architecture, flexible framework support, and robust workflow orchestration ensure reproducibility, efficiency, and collaboration in machine learning development. Train small or complex models, including GPU-accelerated deep learning, hyperparameter optimization, etc.

Balance performance vs. cost efficiency for training and fine-tuning

Train on a single GPU, multiple GPUs on a single node, or scale across multiple nodes. Leverage heterogeneous clusters with various accelerators, including Nvidia GPUs, Google TPUs, and AWS silicon. Utilize fractional GPUs for non-intensive tasks and take advantage of GPUs on spot nodes. All training infrastructure is shared and ephemeral, scaling to zero upon completion.

Orchestrate the entire training lifecycle on a unified platform

Training a model is often multiple steps, from data processing evaluation to validation. Orchestrate the entire training lifecycle on a unified platform. Fine-tune models reactively based on data arrival or automate downstream predictions during a successful training run. Catalog all models and access versions of models through a unified model registry.

Work seamlessly with your preferred languages, ML frameworks, and libraries

Leverage Union’s expansive support for ML frameworks, libraries, and Agent framework with pre-built components best suited for your needs. Run distributed training using PyTorch and TensorFlow by defining training tasks as containerized functions.

Build reliable, interpretable, and trustworthy models collaboratively across teams

Track end-to-end data and model lineage between workflows and teams using Artifacts, a registry for models and data. Trace any model predictions to the specific dataset used for model training. Immutable executions help you confidently reproduce and verify results to provide transparency and interpretability of models.

Testimonials

“Cradle addressed its data provenance requirements by leveraging key Flyte functionality. Since everything is versioned in Flyte, it is possible to trace what code and image produced which outputs from which inputs.”

Eli Bixby

Co-Founder & ML Lead at Cradle

Resources

Thomas Fan

•

August 19, 2024

Flyte and Weights & Biases Integration

With Flyte’s latest plugin for Weights & Biases, you can now effectively run Machine Learning or AI workflows on Union and integrate with Weights & Biases capabilities.

Read the story→

Fine-tune Llama 2 with Limited Resources

Niels Bantilan

•

August 30, 2023

Fine-tune Llama 2 with Limited Resources

Do more with less: Refine the 70 billion parameter Llama 2 model on your dataset with a bunch of T4s

Read the story→

Fine-Tuning Insights: Lessons from Experimenting with RedPajama Large Language Model on Flyte Slack Data

Samhita Alla

•

July 4, 2023

Fine-Tuning Insights: Lessons from Experimenting with RedPajama Large Language Model on Flyte Slack Data

Large language models (LLMs) have taken the world by storm, revolutionizing our understanding and generation of human-like text.

Read the story→

The ML Doctor Says: Don’t Build Fancy Models Before You Set a Simple Baseline

Niels Bantilan

•

February 1, 2023

The ML Doctor Says: Don’t Build Fancy Models Before You Set a Simple Baseline

So you’ve taken a few online courses in machine learning and landed a data scientist role in your first industry job.

Read the story→

Get started

Request demo

Confidently run large-scale training or fine-tuning on GPU clusters across clouds and on-premise

Balance performance vs. cost efficiency for training and fine-tuning

Orchestrate the entire training lifecycle on a unified platform

Work seamlessly with your preferred languages, ML frameworks, and libraries

Build reliable, interpretable, and trustworthy models collaboratively across teams

Testimonials

Resources

Flyte and Weights & Biases Integration

Fine-tune Llama 2 with Limited Resources

Fine-Tuning Insights: Lessons from Experimenting with RedPajama Large Language Model on Flyte Slack Data

The ML Doctor Says: Don’t Build Fancy Models Before You Set a Simple Baseline

Get started

Open Source Projects

Use Cases

Learn

Company