The modern framework for orchestrating complex, data-intensive workflows
By combining a powerful compute backend with an elegant Pythonic interface, Flyte brings software engineering best practices to every step of the AI lifecycle, enabling teams to build resilient, reproducible pipelines.
Built with AI in mind
Flyte was built from first principles to solve the unique challenges presented by AI development. Containerized workloads, automatically-versioned entities, and data management abstractions ensure that work can be reproduced and seamlessly promoted from development to production. Native compute and first-class integrations with external systems allow for fast, efficient distribution of workloads across a shared backend. An API-driven architecture promotes interoperability and reuse of data and model artifacts across the organization.
Containerized tasks
Easily manage complex, heterogeneous dependencies within the same workflow
Automatic versioning
Seamlessly integrate with GitOps and never lose historical workflow executions
Abstracted data flow
Avoid data management in code and easily recover from the point of failure
Native compute
Instantly run workloads on GPU, TPU, and spot instances (and scale to zero afterwards)
Agents
Easily manage authentication and control flow through external services
Notebook support
Execute workloads and pull down results directly using a Python-based SDK
Built for rapid iteration
Flyte was designed to help AI developers rapidly prototype, test, and ship complex, data- and compute-intensive workflows. Single-task executions, image management in code, and declarative infrastructure allow workflow authors to incrementally develop pipelines one step at a time. Local-remote parity enables teams to test workflows in CI while running the same logic at scale in remote environments. Task and workflow composability and dynamism support complex AI-specific use cases such as hyperparameter tuning.
Single-task executions
Incrementally develop workflows one step at a time using ad-hoc task executions
Dynamic workflows
Dynamically alter the shape of DAGs using data
Local-to-remote parity
Locally test workflows in CI and seamlessly ship to remote
Dynamic image management
Automatically build container images without writing Dockerfiles
Task & workflow reusability
Easily build on pre-existing work simply by running an import
Declarative infrastructure
Adjust resources on the fly in order to right size infrastructure to suit the job at hand
Production-ready resiliency
Flyte was conceived by a team of distributed systems experts to provide extreme failure resiliency and ease of debugging. Caching and automatic recovery facilitate self-healing workflows. Type safety and error-driven branching increase the probability that a given workflow succeeds. Native multi-tenancy and deep integration with IAM ensure secure yet efficient sharing of resources.
Fault tolerance
Automatically retry failed workloads according to user-defined policies
Caching
Cache results of intermediate executions in order to recover from the point of failure
Native multi-tenancy
Efficiently share resources while protecting important workloads from resource starvation
Type safety
Catch compile-time type errors before kicking off long-running workflows
Robust error handling
Dynamically handle different types of errors during execution
Isolation & security
Define independent IAM permissions at the workflow level
Get started
Try Union, the only Flyte-native AI platform.