Take generative AI applications to production faster
Union provides a full-stack ecosystem equipped with everything you need to deploy your generative AI solutions at scale.
Build production-grade RAG data pipelines
RAG data pipelines typically involve document pre-processing, ingestion, and embedding generation. You may want to run multiple such pipelines and iterate on them quickly. Union enables you to build “reproducible” RAG pipelines with “versioning”, “caching” and “lineage”.
Parallelize LLM batch inference using map tasks
Generating predictions from an LLM sequentially is both resource-intensive and time-consuming. By using map tasks to distribute inference workloads, you can significantly reduce latency and ensure efficient resource utilization. Declaratively specify the resources needed for each inference call to maximize the utilization of your compute resources.
Optimize RAG workloads with long-running actors*
Union’s long-running actors provide reusable containers to pre-load models and data into readily available execution environments. Warm containers cut down on container startup time enabling near real-time responsiveness for RAG workloads.
*Coming soon
Build repeatable and observable AI pipelines
Reproducibility and observability are core strengths of Union. Leverage Union to implement advanced AI pipelines, including agentic workflows. Use raw containers for agent sandboxing, OpenAI agents for controlled communication with LLMs, and dynamic and eager constructs to refine your workflows. Customize hardware, memory, and image requirements for each pipeline component to ensure optimal performance and resource utilization.