/ data processing

Seamlessly connect to your data stack. Focus on data, not infrastructure.

Union is the backbone for serious data projects. Build data pipelines that are reliable, easy to maintain, and scalable out of the box, creating business value from day one.

Build complex data processing pipelines with ease

Leverage Flyte’s Python-based, intuitive workflow definition language to express data processing pipelines as directed acyclic graphs (DAGs), specifying dependencies between tasks and orchestrating the flow of data through the pipeline. Utilize built-in operators and functions for a wide range of data transformations, and abstract data flow between nodes in the execution graph.

Process massive datasets without sacrificing performance or wasting resources

Harness Union’s native parallelization and optimization capabilities to maximize throughput and reduce processing time for large-scale data processing tasks. Union’s dynamic resource allocation optimizes data processing workflows such that computational resources are allocated efficiently based on workload demands. Supercharge Union’s capabilities by utilizing Spark, Dask, etc., for even greater scalability, fault tolerance, and performance optimization when working with big data.

Build indestructible data pipelines, enabling rapid iteration, recoverability, and memoization

Union’s built-in type safety, caching, and error-handling mechanisms ensure the reliability and resilience of workflows. Construct durable data pipelines by configuring retries, task rerouting to healthy nodes, verbose logs, and stack traces. Build with confidence that comes with a fault-tolerant and highly available platform architecture, versioned-everything for rollback to known-good states, alerting, & monitoring. Debug remote-run failures with Interactive Tasks by attaching a browser-based VS Code IDE.

Gain insights into how data is transformed and processed at each workflow stage

Use advanced data lineage and metadata management features that enable users to track the provenance of data and metadata throughout the data processing lifecycle. Capture and leverage rich metadata to enable lineage tracking and auditing, ensuring transparency, compliance, and accountability in data processing operations. Foster collaboration across teams using automated and reactive pipelines that are dependent on data availability.

Testimonials

“Union’s handling of ETL and EDA has simplified working with data. Its learning curve’s not terribly steep, but the yield curve is incredible. Once people get the underlying concept, it’s incredibly easy and rewarding.”

B
Brian O’Donovan
Sr Director of Bioinformatics & Computational Biology at Delve Bio

“We want to simplify and not have to think about and manage different technology stacks. We want to write everything in a Union workflow and have one platform for orchestrating these jobs; that’s awesome and less stuff for us to worry about.”

T
Thomas Busath
ML Engineer at Porch

Resources

Faster Airflow to Flyte migration powered by Flyte Airflow Agents
Kevin Su
Kevin Su
May 9, 2024
Article

Faster Airflow to Flyte migration powered by Flyte Airflow Agents

We have had the privilege of seeing data teams experience the value of a unified platform for both machine learning and data pipelines.
Read the story

How Porch used Union to migrate off Airflow & consolidate its data & ML operations

Read case study

How Warner Bros. Discovery keeps its media streams flowing

Read case study