Sage Elliott

Build Faster AI Pipelines with Union.ai Actors: Reuse Containers, Skip Cold Starts

When building real-world AI systems, performance isn’t just about how fast your models run when deployed — it's also about how efficiently your infrastructure supports them. If you're running complex, multi-step AI workflows (like fine-tuning, document ORC, batch inference, RAG pipelines, or multi-model ensembles), chances are you’ve hit some limitations of container spin-up overhead, cold starts, and duplicated resource initialization.

A common workaround is to turn parts of your pipeline into long-running services. But that trades one set of issues (cold starts, duplication) for another: persistent resource usage, complex autoscaling, and lost flexibility.

Union.ai Actors offer a smarter way.

Using Actors (reusable stateful containers) vs regular containers in workflows

What are Actors in Union.ai?

In simple terms, Actors let you reuse a container and its environment across multiple tasks, skipping the costly overhead of starting a fresh container every time. This is a Union-only feature that isn’t available in open-source Flyte — and for teams pushing the boundaries of AI and data pipelines, it's a game changer.

Union.ai Actors are ideal for:

  • Tasks with long setup times (e.g. loading large models or dependencies)
  • Repetitive operations that can share a warm container
  • Stateful resources that are expensive to re-initialize

Why Actors matter for complex data and AI pipelines

Let’s say you’re building a pipeline for genome sequencing analysis, where each sample might go through several computational biology steps:

  • Download and index a large reference genome
  • Preprocess sample files (e.g., quality filtering, alignment)
  • Run multiple tools like BWA, SAMtools, or GATK for variant calling
  • Aggregate and filter results
  • Format outputs for downstream interpretation

Without actors, every one of these steps may trigger a new container — even if they're all using the same model or shared data resources. That’s a ton of duplicated overhead, especially when loading models or initializing libraries takes seconds (or even minutes).

With Actors, you can keep a container warm, cache shared resources, and execute sequential tasks within the same process — giving you massive performance wins. 

Union Actor Example: Say Hello (Fast): try running in a notebook here

Copied to clipboard!
# hello_actors.py
import union

actor = union.ActorEnvironment(
    name="my-actor-container", # unique name for the actor env 
    replica_count=1,# Number of actor replicas to provision.
    ttl_seconds=120,  # Keep the actor alive even when idle
    requests=union.Resources(cpu="2", mem="300Mi"), # Compute resources actor requires
)

@actor.task
def add_ints(num1: int, num2: int) -> int:
    return num1 + num2

@union.workflow
def wf_add()  -> int:
  num = add_ints(4, 2)
  num = add_ints(num, 5)
  num = add_ints(num, 2)
  num = add_ints(num, 4)

  return num

#union run --remote hello_actors.py wf_add

This simple example avoids spinning up a new container for each add_ints task. Instead it runs on the same Actor every time, with startup costs already paid on the first task. The image below shows the initial startup time was 19 seconds, and each task running in the actor container ran faster than 600 milliseconds!

Creating faster AI workflows with Union Actors

If we relaunch the entire workflow (you can just click relaunch in the UI if you’re following along in the notebook) before the Time To Live (TTL) expires, the whole workflow will run without the initial startup time. This can be useful for running near real-time batch inference where you need to run a whole workflow and not just a model prediction.

Rerunning workflows on an active container with Union Actors
“Our products are not powered by a single model but instead are a composite of many models in a larger inference pipeline. What we serve are AI pipelines, which are made of functions, some of which are AI models. Union is ideal for such inference pipelines.”  —ML Developer at Artera AI

In this case we used only one actor container across all the tasks, but if you have different hardware or container requirements you can make multiple actor environments and use them in tasks accordingly. Such as if you need a GPU for training and evaluation, but only CPUs for downloading and preprocessing data.

Going deeper: Caching with @actor_cache

Lets run a more complex use case for Union.ai Actors by loading a large language model into the container on startup and being able to use it across all tasks in the workflow near real-time. The @actor_cache decorator lets you persist Python objects inside an Actor’s memory, across multiple task executions. It’s perfect for things like:

  • AI model loading
  • Tokenizer initialization
  • Shared preprocessing pipelines

Union Actor Example: Cache and image builder: try running in a notebook here

Copied to clipboard!
import union
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from flytekit import ImageSpec, Resources
from union.actor import ActorEnvironment

image = ImageSpec(
    packages=[
        "union",
        "transformers",
        "torch",
        "accelerate",
    ],
)

llm_actor = ActorEnvironment(
    name="gpu-llm-actor",
    container_image=image,
    replica_count=1,
    ttl_seconds=120,
    requests=Resources(cpu="1", mem="2000Mi", gpu="1"),
)


@union.actor_cache
def load_model(model_name: str = "microsoft/Phi-4-mini-instruct") -> pipeline:
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="cuda",
        torch_dtype="auto",
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return pipeline("text-generation", model=model, tokenizer=tokenizer)


@llm_actor.task
def actor_model_predict(query: str, prev_summary: str ="") -> str:
    nlp_pipeline = load_model()
    
    # Chain query using the previous output
    full_query = (
        f"{query} Previously you told me about {prev_summary}."
        "Keep answers short and 1 sentence long"  
        "Include a list of all insects we discussed after answer only if any were provided after the question"
    )
    
    predictions = nlp_pipeline(full_query, batch_size=1, return_full_text=False)
    return predictions[0]["generated_text"]


@union.workflow
def wf_text_gen() -> str:
    result_ant = actor_model_predict(query="What is an ant?")
    result_bee = actor_model_predict(query="What is a bee", prev_summary=result_ant)
    result_wasp = actor_model_predict(query="What is a wasp", prev_summary=result_bee)
    return result_wasp


 # union run --remote actors_cache.py wf_text_gen

The first task run takes a couple minutes as the model is downloaded from hugging face. Every task after just takes a couple seconds!

No extra containers. 
No reinitialization. 
Just cached, stateful performance across the tasks.

If you’re new to Flyte and Union you may not be familiar with ImageSpec. We use it to define the container requirements directly in python. If you’re running the notebook example the first time you run this code cell it will build the image since it doesn't exist in your Union account.

Note: for this example we’re downloading a large language model directly from hugging face hub each time the actor environment is started, but you might want to separate this into its own workflow and save it as a Union Artifact instead. 

When should you use Actors with Union.ai?

Use Actors when:

  • You have compound workflows with interdependent steps
  • You’re running expensive initialization logic that can be reused
  • You want to minimize infrastructure churn (costs, logs, startup time)
  • You're caching shared resources like models, vector stores, or tokenizers
  • Serving models for efficient, high-throughput model reuse, where slight queueing is acceptable and resource compaction matters, Actors are a great fit

You may not greatly benefit from Actors if:

  • You need strict isolation between tasks
  • You’re building long-running or distributed training jobs — these are better served with Union Tasks
  • If you need real-time, token-streaming inference where latency to the first token is critical — like a chatbot — use Union Serving instead

TL;DR: Actors unlock efficient, stateful AI pipelines

Union.ai’s Actors give you a performance and cost advantage by letting your tasks:

  • Share a warm/hot container
  • Cache heavy resources (models, data, etc)
  • Reduce latency and spin-up time
  • Avoid duplicated logic in compound pipelines

This is next-level infrastructure for serious AI workloads, and something teams are already adopting to accelerate development and infrastructure reduce costs.

If you’re already pushing Flyte to its limits or looking to modernize your pipeline runtime, Actors are a feature worth trying Union for!

Try out Actors in our serverless playground—you can find the examples above on Github.

Interested in learning more about Union.ai? Let’s talk!

AI Orchestration
AI Workflows
GPU Costs
Reusable Containers