/ bioinformatics & synthetic biology

Effortlessly carry out scientific computing workflows with must-have features out of the box

Take advantage of Union’s unique features to perform the highest-quality scientific research. With data-flow as a first class construct, declarative dependencies and strict versioning across the board, Union provides all the features you need to run your bioinformatics workflows at scale.

Capture heterogeneous dependencies alongside the code that depends on them

ImageSpec lets you quickly declare an OCI-compliant image right alongside your workflow code. With the choice of base image, and any apt, pypi or conda package, you can define dependencies with ease and flexibility. These builds are triggered at registration or run time and make use of their own build cache for additional performance.

Generate novel DNA sequences and predict their protein product in one workflow

Generating novel biomolecules from scratch and predicting downstream features like protein structure requires many moving parts. Union gives you the flexibility to pull in disparate dependencies as well as enterprise features to quickly compute over massive datasets. Defining notebooks or scripts as discrete tasks ensures extensibility in the future, while features like Decks enable you to visualize your 3D proteins as you iterate today.

Make the flow of data an integral part of your analysis

Dataflow is at the heart of any analysis run on Union. Workflows depend on strongly typed inputs and outputs to assemble the DAG. With an object store at the heart of every execution, and sensible types representing files and directories, you no longer need to manually keep track of files on a local filesystem or scan directories for brittle file suffixes.

Speed up reads, reduce costs, and maintain performance when working on large static assets

Many bioinformatics analyses rely on large, rarely changed assets such as reference genomes or protein databases. Accelerated Datasets let you cache these often terabyte-scale files on local, high-speed disks so you don’t have to waste time re-downloading them for every task. Union lets you save costs through ephemeral compute without sacrificing performance on IO-heavy tasks.

Testimonials

“Before leveraging the accelerated datasets solution, we were opting to build the index on the fly (downloading the source data and building the minhash table from it) to minimize the data transfer at the expense of needing tons of RAM for each pod. With the persistent storage option, we are able to store the pre-built indices and also reduce the RAM requirement for each worker. The gains reduced the task execution time by roughly 50% and the RAM requirements by 75%, effectively quadrupling our throughput on the same node pool.”

B
Brian O’Donovan
Sr Director of Bioinformatics & Computational Biology at Delve Bio

“Without Flyte, we couldn’t have done what we’ve done so far with the people that we have. You need a workflow orchestration engine if you’re going to do ML at our level, and Flyte is the best one.”

E
Eli Bixby
Co-Founder & ML Lead at Cradle

Resources

NVIDIA Parabricks on Flyte: Orchestrating Accelerated Bioinformatics
Pryce Turner
Pryce Turner
July 31, 2024
Bioinformatics

NVIDIA Parabricks on Flyte: Orchestrating Accelerated Bioinformatics

NVIDIA Parabricks is a software suite that accelerates genomic sequence analysis by reimplementing industry standard tools to use the parallel...
Read the story
Human-in-the-Loop Pipelines
Pryce Turner
Pryce Turner
December 6, 2023
Bioinformatics
MLOps

Human-in-the-Loop Pipelines

By walking through a genomic alignments code and a Streamlit app, explore how Union makes it easier to connect external inputs to pipelines.
Read the story
Sequences and Systems: The Convergence of Machine Learning and Biotech
Sara Gawlinski
Sara Gawlinski
October 5, 2023
Machine Learning
Bioinformatics
Events

Sequences and Systems: The Convergence of Machine Learning and Biotech

When NGS applied the power of massively parallel processing to the analysis of DNA, it transformed the biotech playing field.
Read the story

Cradle accelerates ML development for its protein design models by using Flyte™

Read case study