Cradle accelerates ML development for its protein design models by using Flyte™
Cradle is a leader in the relatively nascent but quickly growing field of synthetic biology. Traditional protein engineering projects go through dozens of iterations across thousands of variants costing millions of dollars. Instead of these expensive, painstaking iterations, often using a best-guess approach at each cycle, Cradle leverages Flyte to develop and deploy its bespoke ML models to optimize multiple properties of proteins simultaneously. Cradle partners with labs throughout the entire R&D cycle to accelerate the discovery process, making it much more likely to settle on viable protein candidates. It was recently named to the Forbes AI 50 list of most innovative AI companies.
The Cradle team believes that Flyte was essential in getting its protein design platform to market quickly. The team could only have made such progress with Flyte. They’re convinced that a workflow orchestrator is required for sophisticated ML development, and Flyte is the best solution.
About Cradle
Cradle believes biology can be used to produce a vast majority of physical inputs powering the global economy. These include therapeutics, chemicals, materials, and food. Designing proteins is not easy, but with its tools and machine learning models, that is changing. Cradle democratizes access to these tools and helps companies build protein-based products in record time. Cradle’s mission is to help replace traditional farms and factories for a more sustainable world.
In search of a highly-performant orchestration engine
Like most organizations, Cradle started with scripts, manual processes, and glue code to utilize various solutions for building and running its pipelines. While this approach worked initially, it did not scale well and limited their ability to implement AI development best practices such as reproducibility, isolation of dependencies and resources across tasks, caching, and data provenance.
Reproducibility and data provenance were essential since Cradle’s iteration cycles can be months apart. Being able to look back and see exactly which workflow produced which output was a requirement. Caching was important because Cradle’s experiments are long-running and expensive to compute, and having to re-run executions due to failure at any point in the workflow was unacceptable. Lastly, their pipeline utilized legacy tools bespoke to protein engineering. Instead of making extensive changes to use those tools, they needed a solution to continue using them as-is for benchmarking, and focus on workflow development for getting their product to market.
Cradle wanted to resolve these challenges and embarked on a comprehensive evaluation process to find an ML orchestrator that would suit their current and future needs. Platforms such as GCP Vertex and AWS Sagemaker were not viable as the team needed a Kubernetes-native solution for easy interoperability between their public, private cloud and on-premise customers. Other open source solutions such as Argo, Kubeflow, and Airflow did not sufficiently address Cradle’s needs. Flyte’s capabilities such as caching, data provenance, and reproducibility met their must-have requirements. Additionally, as part of the evaluation, the Cradle team also discovered other Flyte capabilities, such as local execution, and realized their eventual solution also required this functionality.
Flyte @ Cradle: Opinionated enough to go fast, flexible enough to go anywhere
Cradle deployed Flyte on GCP in a Google Kubeternes Engine (GKE) cluster and quickly started to see value from the platform.
Cradle was able to address its data provenance requirements but leveraging key Flyte functionality. Since everything is versioned in Flyte, it is possible to trace what code and image produced which outputs from which inputs. This enables reproducibility, a best practice for ML development in general, and a must-have requirement for a biotech AI company like Cradle. Development cycles can be long because the protein designs take anywhere from six weeks to six months to return from the lab. Now, when the lab results come back, with Flyte, the Cradle team can compare the model results on the trained data and identify the weak points.
“Flyte has accelerated ML development and deployment. But more importantly, it has given us capabilities that were not possible before.” — Daniel Danciu, CTO
Flyte’s native and powerful caching capability helped Cradle to save intermediate progress across their workflows. It relies on strongly typed, automatically offloaded inputs and outputs, as well as code hashing, to offer strong cache accuracy guarantees while providing a simple user interface. This is in stark contrast to manually-implemented caching. As Eli Bixby, Cradle’s co-founder, shared, “If you’re manually doing the caching as part of every step, the way you write code changes, and it’s not clean. And that magnifies the number of outputs in your interface quite a lot.”
Caching in Flyte is as simple as setting `cache=True` in the task decorator and letting the backend handle the rest. Cradle can now run very large workflows and be confident that with caching, they can pick up where the computation left off after any failure. Additionally, caching enables the team to compose reference tasks, which allows collaboration and reuse of existing work across an organization. The team is able to leverage this feature to effectively decouple development and yet integrate with confidence powered by Flyte’s static typing system and leverage features like caching etc to dedupe executions across the organization,. This dramatically simplifies authoring workflows, reduces execution wall time, and improves the cost-effectiveness of the compute resources.
Flyte is also extremely flexible in defining dependencies with the ability to isolate dependencies and resources per task. Being able to capture bioinformatics requirements in this way, especially for non-Python tasks, is a huge plus.
Additionally, with Flyte, Cradle can meet its specialized compute requirements easily. By decorating the task with a PodTemplate, users can effectively map specific workloads to specific node types. In this case, the Cradle team defined a pod template for a task running a C++ tool performing a terabyte-scale in-memory scan. This task, whenever called from within any workflow, would always run on a specific high-memory node type, which would spin down to 0 in the absence of additional work. By taking advantage of Flyte’s granular task-level declarative infrastructure, the Cradle team can let the scheduler handle the behind-the-scenes resource assignments without babysitting expensive nodes.
They also found it valuable that Flyte enforces strong development practices. A key one is compile-time typing, which is critical to compile workflows and ensure the dataflow will proceed between tasks without issues. Some engineers at Cradle were not accustomed to type hinting their Python code. Since implementing Flyte, they find it strange not to use type hints - making for a much more deterministic execution.
“Flyte has helped us impose some discipline in how we do ML. It's really a huge win. What took us three days now takes our data scientists three hours.” — Daniel Danciu, CTO
Cradle also uses Flyte’s modularity to adapt it for some custom approaches in implementing additional best practices. For example, instead of importing tasks directly from different modules, the team has written a wrapper to check the import path and replace it with a reference task when possible. This leads to much faster build times from smaller Docker images. The benefits extend to fewer dependency/import conflicts and allows them to organize their repo flexibly.
Representing biological samples computationally has relied heavily on a local file system. This can become cumbersome, as with legacy orchestrators, especially in the biotech space. Pointing tools to directories and letting them infer filetypes based on extensions is a common pattern. Flyte abstracts away this overhead and is a key differentiator.
“Without Flyte, we couldn't have done what we've done so far with the people that we have. You need a workflow orchestration engine if you're going to do ML at our level, and Flyte is the best one.” — Eli Bixby, Co-Founder
Cradle has spent nearly 18 months with Flyte with no signs of slowing down. They’ve invested heavily in its out-of-box use and customization, from the developer experience to the cluster deployment story. The Cradle team believes that Flyte’s functionality and very responsive community have been instrumental in meeting their business and technical needs and that they simply couldn’t meet those without it. Flyte is the orchestration engine for companies like Cradle, which are pushing the boundaries of science and AI.