Running your code
Set up your development environment
If you have not already done so, follow the Getting started section to sign in to Flyte, and set up your local environment.
CLI commands for running your code
The Pyflyte CLI and Flytectl CLI provide commands that allow you to deploy and run your code at different stages of the development cycle:
-
pyflyte run
: For deploying and running a single script immediately in your local Python environment. -
pyflyte run --remote
: For deploying and running a single script immediately in the cloud on Flyte. -
pyflyte register
: For deploying multiple scripts to Flyte and running them from the Web interface. -
pyflyte package
andflytectl register
: For deploying workflows to production and for scripting within a CI/CD pipeline.
In some cases, you may want to test your code in a local cluster before deploying it to Flyte. This step corresponds to using the commands 2, 3, or 4, but targeting your local cluster instead of Flyte. For more details, see Running in a local cluster.
Registration pattern summary
The following diagram provides a summarized view of the different registration patterns:
Running a script in local Python with pyflyte run
During the development cycle you will want to run a specific workflow or task in your local Python environment to test it.
To quickly try out the code locally use pyflyte run
:
$ pyflyte run workflows/example.py wf --name 'Albert'
Here you are invoking pyflyte run
and passing the name of the Python file and the name of the workflow within that file that you want to run.
In addition, you are passing the named parameter name
and its value.
This command is useful for quickly testing a workflow locally to check for basic errors. For more details see pyflyte run details.
Running a script on Flyte with pyflyte run --remote
To quickly run a workflow on Flyte, use pyflyte run --remote
:
$ pyflyte run --remote --project basic-example --domain development workflows/example.py wf --name 'Albert'
Here we are invoking pyflyte run --remote
and passing:
- The project,
basic-example
- The domain,
development
- The Python file,
workflows/example.py
- The workflow within that file that you want to run,
wf
- The named parameter
name
, and its value
This command will:
- Build the container image defined in your
ImageSpec
.
- Push the image to the container registry specified in that
ImageSpec
. Don’t forget make the image accessible to Flyte. For example, if you are using GitHub Container Registry, you will need to make the image public.
- Package up your code and deploy it to the specified project and domain in Flyte.
- Run the workflow on Flyte.
This command is useful for quickly deploying and running a specific workflow on Flyte. For more details see pyflyte run details.
Running tasks through flytectl
This is a multi-step process where we create an execution spec file, update the spec file, and then create the execution.
Generate execution spec file
$ flytectl launch task --project flytesnacks --domain development --name workflows.example.generate_normal_df --version v1
Update the input spec file for arguments to the workflow
iamRoleARN: 'arn:aws:iam::12345678:role/defaultrole'
inputs:
n: 200
mean: 0.0
sigma: 1.0
kubeServiceAcct: ""
targetDomain: ""
targetProject: ""
task: workflows.example.generate_normal_df
version: "v1"
Create execution using the exec spec file
$ flytectl create execution -p flytesnacks -d development --execFile exec_spec.yaml
Monitor the execution by providing the execution id from create command
$ flytectl get execution -p flytesnacks -d development <execid>
Running workflows through flytectl
Workflows on their own are not runnable directly. However, a launchplan is always bound to a workflow (at least the auto-create default launch plan) and you can use
launchplans to launch
a workflow. The default launchplan
for a workflow has the same name as its workflow and all argument defaults are also identical.
Tasks also can be executed using the launch command. One difference between running a task and a workflow via launchplans is that launchplans cannot be associated with a task. This is to avoid triggers and scheduling.
Generate an execution spec file
$ flytectl get launchplan -p flytesnacks -d development myapp.workflows.example.my_wf --execFile exec_spec.yaml
Update the input spec file for arguments to the workflow
inputs:
name: "adam"
Create execution using the exec spec file
$ flytectl create execution -p flytesnacks -d development --execFile exec_spec.yaml
Monitor the execution by providing the execution id from create command
$ flytectl get execution -p flytesnacks -d development <execid>
Deploying your code to Flyte with pyflyte register
$ pyflyte register workflows --project basic-example --domain development
Here we are registering all the code in the workflows
directory to the project basic-example
in the domain development
.
This command will:
- Build the container image defined in your
ImageSpec
. - Package up your code and deploy it to the specified project and domain in Flyte.
The package will contain the code in the Python package located in the
workflows
directory. Note that the presence of the__init__.py
file in this directory is necessary in order to make it a Python package.
The command will not run the workflow. You can run it from the Web interface.
This command is useful for deploying your full set of workflows to Flyte for testing.
Fast registration
pyflyte register
packages up your code through a mechanism called fast registration.
Fast registration is useful when you already have a container image that’s hosted in your container registry of choice, and you change your workflow/task code without any changes in your system-level/Python dependencies. At a high level, fast registration:
-
Packages and zips up the directory/file that you specify as the argument to
pyflyte register
, along with any files in the root directory of your project. The result of this is a tarball that is packaged into a.tar.gz
file, which also includes the serialized task (inprotobuf
format) and workflow specifications defined in your workflow code. -
Registers the package to the specified cluster and uploads the tarball containing the user-defined code into the configured blob store (e.g. S3, GCS).
At workflow execution time, Flyte knows to automatically inject the zipped up task/workflow code into the running container, thereby overriding the user-defined tasks/workflows that were originally baked into the image.
WORKDIR
, PYTHONPATH
, and PATH
When executing any of the above commands, the archive that gets creates is extracted wherever the WORKDIR
is set.
This can be handled directly via the WORKDIR
directive in a Dockerfile
, or specified via source_root
if using ImageSpec
.
This is important for discovering code and executables via PATH
or PYTHONPATH
.
A common pattern for making your Python packages fully discoverable is to have a top-level src
folder, adding that to your PYTHONPATH
,
and making all your imports absolute.
This avoids having to “install” your Python project in the image at any point e.g. via pip install -e
.
Inspecting executions
Flytectl supports inspecting execution by retrieving its details. For a deeper dive, refer to the API reference guide.
Monitor the execution by providing the execution id from create command which can be task or workflow execution.
$ flytectl get execution -p flytesnacks -d development <execid>
For more details use --details
flag which shows node executions along with task executions on them.
$ flytectl get execution -p flytesnacks -d development <execid> --details
If you prefer to see yaml/json view for the details then change the output format using the -o flag.
$ flytectl get execution -p flytesnacks -d development <execid> --details -o yaml
To see the results of the execution you can inspect the node closure outputUri in detailed yaml output.
"outputUri": "s3://my-s3-bucket/metadata/propeller/flytesnacks-development-<execid>/n0/data/0/outputs.pb"
Deploying your code to production
Package your code with pyflyte package
The combination of pyflyte package
and flytectl register
is the standard way of deploying your code to production.
This method is often used in scripts to build and deploy workflows in a CI/CD pipeline.
First, package your workflows:
$ pyflyte --pkgs workflows package
This will create a tar file called flyte-package.tgz
of the Python package located in the workflows
directory.
Note that the presence of the __init__.py
file in this directory is necessary in order to make it a Python package.
You can specify multiple workflow directories using the following command:
pyflyte --pkgs DIR1 --pkgs DIR2 package ...
This is useful in cases where you want to register two different projects that you maintain in a single place.
If you encounter a ModuleNotFoundError when packaging, use the –source option to include the correct source paths. For instance:
pyflyte --pkgs <dir1> package --source ./src -f
Register the package with flytectl register
Once the code is packaged you register it using the flytectl
CLI:
$ flytectl register files \
--project basic-example
--domain development \
--archive flyte-package.tgz
--version "$(git rev-parse HEAD)"
Let’s break down what each flag is doing here:
-
--project
: The target Flyte project. -
--domain
: The target domain. Usually one ofdevelopment
,staging
, orproduction
. -
--archive
: This argument allows you to pass in a package file, which in this case is theflyte-package.tgz
produced earlier. -
--version
: This is a version string that can be any string, but we recommend using the Git SHA in general, especially in production use cases.
See Flytectl CLI for more details.
Using pyflyte register versus pyflyte package + flytectl register
As a rule of thumb, pyflyte register
works well when you are working on a single cluster and iterating quickly on your task/workflow code.
On the other hand, pyflyte package
and flytectl register
is appropriate if you are:
-
Working with multiple clusters, since it uses a portable package
-
Deploying workflows to a production context
-
Testing your workflows in your CI/CD infrastructure.
You can also perform the equivalent of the three methods of registration using a FlyteRemote object.
Image management and registration method
The ImageSpec
construct available in flytekit
also has a mechanism to copy files into the image being built.
Its behavior depends on the type of registration used:
-
If fast register is used, then it’s assumed that you don’t also want to copy source files into the built image.
-
If fast register is not used (which is the default for
pyflyte package
, or ifpyflyte register --copy none
is specified), then it’s assumed that you do want source files copied into the built image. -
If your
ImageSpec
constructor specifies asource_root
and thecopy
argument is set to something other thanCopyFileDetection.NO_COPY
, then files will be copied regardless of fast registration status.
Building your own images
ImageSpec
and the envd
image builder on registration, you can, if you wish build and deploy your own images separately.You can start with pyflyte init --template basic-template-dockerfile
, the resulting template project includes a docker_build.sh
script that you can use to build and tag a container according to the recommended practice:
$ ./docker_build.sh
By default, the docker_build.sh
script:
-
Uses the
PROJECT_NAME
specified in the pyflyte command, which in this case is my_project. -
Will not use any remote registry.
-
Uses the Git SHA to version your tasks and workflows.
You can override the default values with the following flags:
$ ./docker_build.sh -p PROJECT_NAME -r REGISTRY -v VERSION
For example, if you want to push your Docker image to Github’s container registry you can specify the -r ghcr.io
flag.
The docker_build.sh
script is purely for convenience; you can always roll your own way of building Docker containers.
Once you’ve built the image, you can push it to the specified registry. For example, if you’re using Github container registry, do the following:
$ docker login ghcr.io
$ docker push TAG
CI/CD with Flyte and GitHub Actions
You can use any of the commands we learned in this guide to register, execute, or test Flyte workflows in your CI/CD process. Flyte provides two GitHub actions that facilitate this:
-
flyte-setup-action
: This action handles the installation of flytectl in your action runner. -
flyte-register-action
: This action usesflytectl register
under the hood to handle registration of packages, for example, the.tgz
archives that are created bypyflyte package
.
Some CI/CD best practices
In the case where workflows are registered on each commit in your build pipelines, you can consider the following recommendations and approach:
-
Versioning Strategy : Determining the version of the build for different types of commits makes them consistent and identifiable. For commits on feature branches, use
{branch-name}-{short-commit-hash}
and for the ones on main branches, usemain-{short-commit-hash}
. Use version numbers for the released (tagged) versions. -
Workflow Serialization and Registration : Workflows should be serialized and registered based on the versioning of the build and the container image. Depending on whether the build is for a feature branch or
main
, the registration domain should be adjusted accordingly. -
Container Image Specification : When managing multiple images across tasks within a workflow, use the
--image
flag during registration to specify which image to use. This avoids hardcoding the image within the task definition, promoting reusability and flexibility in workflows.