/ GenAI & LLMs

Take generative AI applications to production faster

Union provides a full-stack ecosystem equipped with everything you need to deploy your generative AI solutions at scale.

Build production-grade RAG data pipelines

RAG data pipelines typically involve document pre-processing, ingestion, and embedding generation. You may want to run multiple such pipelines and iterate on them quickly. Union enables you to build “reproducible” RAG pipelines with “versioning”, “caching” and “lineage”.

Parallelize LLM batch inference using map tasks

Generating predictions from an LLM sequentially is both resource-intensive and time-consuming. By using map tasks to distribute inference workloads, you can significantly reduce latency and ensure efficient resource utilization. Declaratively specify the resources needed for each inference call to maximize the utilization of your compute resources.

Optimize RAG workloads with long-running actors*

Union’s long-running actors provide reusable containers to pre-load models and data into readily available execution environments. Warm containers cut down on container startup time enabling near real-time responsiveness for RAG workloads.

*Coming soon

Build repeatable and observable AI pipelines

Reproducibility and observability are core strengths of Union. Leverage Union to implement advanced AI pipelines, including agentic workflows. Use raw containers for agent sandboxing, OpenAI agents for controlled communication with LLMs, and dynamic and eager constructs to refine your workflows. Customize hardware, memory, and image requirements for each pipeline component to ensure optimal performance and resource utilization.

Resources

Performance Tuning AI Models with NVIDIA DGX Cloud
Thomas Fan
Thomas Fan
April 29, 2024
Article

Performance Tuning AI Models with NVIDIA DGX Cloud

As generative AI models become more capable and deployed in various contexts, optimizing the model in terms of throughput and memory consumption...
Read the story
How LLMs Are Transforming Computer Vision
Sage Elliott
Sage Elliott
January 5, 2024
LLMs
Computer Vision
Podcast

How LLMs Are Transforming Computer Vision

Voxel51 is the data-centric machine learning software company behind FiftyOne, an open-source toolkit for building computer vision workflows.
Read the story
Fine-Tuning Insights: Using LLMs as Preprocessors to Improve Dataset Quality
Samhita Alla
Samhita Alla
September 13, 2023
LLMs
Data Processing
Data Quality

Fine-Tuning Insights: Using LLMs as Preprocessors to Improve Dataset Quality

LLMs for data cleaning: Yay or nay?
Read the story
Fine-tune Llama 2 with Limited Resources
Niels Bantilan
Niels Bantilan
August 30, 2023
LLMs
Model Training

Fine-tune Llama 2 with Limited Resources

Do more with less: Refine the 70 billion parameter Llama 2 model on your dataset with a bunch of T4s
Read the story
Fine Tuning vs. Prompt Engineering Large Language Models
Niels Bantilan
Niels Bantilan
May 15, 2023
LLMs
Model Training
Prompt Engineering

Fine Tuning vs. Prompt Engineering Large Language Models

When to manipulate the input prompt and when to roll up your sleeves and update parameter weights.
Read the story