Effortlessly carry out scientific computing workflows with must-have features out of the box
Take advantage of Union’s unique features to perform the highest-quality scientific research. With data-flow as a first class construct, declarative dependencies and strict versioning across the board, Union provides all the features you need to run your bioinformatics workflows at scale.
Capture heterogeneous dependencies alongside the code that depends on them
ImageSpec lets you quickly declare an OCI-compliant image right alongside your workflow code. With the choice of base image, and any apt, pypi or conda package, you can define dependencies with ease and flexibility. These builds are triggered at registration or run time and make use of their own build cache for additional performance.
Generate novel DNA sequences and predict their protein product in one workflow
Generating novel biomolecules from scratch and predicting downstream features like protein structure requires many moving parts. Union gives you the flexibility to pull in disparate dependencies as well as enterprise features to quickly compute over massive datasets. Defining notebooks or scripts as discrete tasks ensures extensibility in the future, while features like Decks enable you to visualize your 3D proteins as you iterate today.
Make the flow of data an integral part of your analysis
Dataflow is at the heart of any analysis run on Union. Workflows depend on strongly typed inputs and outputs to assemble the DAG. With an object store at the heart of every execution, and sensible types representing files and directories, you no longer need to manually keep track of files on a local filesystem or scan directories for brittle file suffixes.
Speed up reads, reduce costs, and maintain performance when working on large static assets
Many bioinformatics analyses rely on large, rarely changed assets such as reference genomes or protein databases. Accelerated Datasets let you cache these often terabyte-scale files on local, high-speed disks so you don’t have to waste time re-downloading them for every task. Union lets you save costs through ephemeral compute without sacrificing performance on IO-heavy tasks.
Testimonials
“Before leveraging the accelerated datasets solution, we were opting to build the index on the fly (downloading the source data and building the minhash table from it) to minimize the data transfer at the expense of needing tons of RAM for each pod. With the persistent storage option, we are able to store the pre-built indices and also reduce the RAM requirement for each worker. The gains reduced the task execution time by roughly 50% and the RAM requirements by 75%, effectively quadrupling our throughput on the same node pool.”
“Without Flyte, we couldn’t have done what we’ve done so far with the people that we have. You need a workflow orchestration engine if you’re going to do ML at our level, and Flyte is the best one.”