Running your code

Set up your development environment

If you have not already done so, follow the Getting started section to sign in to Union.ai, and set up your local environment.

CLI commands for running your code

The Union CLI and Uctl CLI provide commands that allow you to deploy and run your code at different stages of the development cycle:

  1. union run: For deploying and running a single script immediately in your local Python environment.

  2. union run --remote: For deploying and running a single script immediately in the cloud on Union.ai.

  3. union register: For deploying multiple scripts to Union.ai and running them from the Web interface.

  4. union package and uctl register: For deploying workflows to production and for scripting within a CI/CD pipeline.

In some cases, you may want to test your code in a local cluster before deploying it to Union.ai. This step corresponds to using the commands 2, 3, or 4, but targeting your local cluster instead of Union.ai. For more details, see Running in a local cluster.

Running a script in local Python with union run

During the development cycle you will want to run a specific workflow or task in your local Python environment to test it. To quickly try out the code locally use union run:

$ union run workflows/example.py wf --name 'Albert'

Here you are invoking union run and passing the name of the Python file and the name of the workflow within that file that you want to run. In addition, you are passing the named parameter name and its value.

This command is useful for quickly testing a workflow locally to check for basic errors. For more details see union run details.

Running a script on Union.ai with union run --remote

To quickly run a workflow on Union.ai, use union run --remote:

$ union run --remote --project basic-example --domain development workflows/example.py wf --name 'Albert'

Here we are invoking union run --remote and passing:

  • The project, basic-example
  • The domain, development
  • The Python file, workflows/example.py
  • The workflow within that file that you want to run, wf
  • The named parameter name, and its value

This command will:

  • Build the container image defined in your ImageSpec.
  • Package up your code and deploy it to the specified project and domain in Union.ai.
  • Run the workflow on Union.ai.

This command is useful for quickly deploying and running a specific workflow on Union.ai. For more details see union run details.

Running tasks through uctl

This is a multi-step process where we create an execution spec file, update the spec file, and then create the execution.

Generate execution spec file

$ uctl launch task --project flytesnacks --domain development --name workflows.example.generate_normal_df --version v1

Update the input spec file for arguments to the workflow

iamRoleARN: 'arn:aws:iam::12345678:role/defaultrole'
inputs:
  n: 200
  mean: 0.0
  sigma: 1.0
kubeServiceAcct: ""
targetDomain: ""
targetProject: ""
task: workflows.example.generate_normal_df
version: "v1"

Create execution using the exec spec file

$ uctl create execution -p flytesnacks -d development --execFile exec_spec.yaml

Monitor the execution by providing the execution id from create command

$ uctl get execution -p flytesnacks -d development <execid>

Running workflows through uctl

Workflows on their own are not runnable directly. However, a launchplan is always bound to a workflow (at least the auto-create default launch plan) and you can use launchplans to launch a workflow. The default launchplan for a workflow has the same name as its workflow and all argument defaults are also identical.

Tasks also can be executed using the launch command. One difference between running a task and a workflow via launchplans is that launchplans cannot be associated with a task. This is to avoid triggers and scheduling.

Generate an execution spec file

$ uctl get launchplan -p flytesnacks -d development myapp.workflows.example.my_wf  --execFile exec_spec.yaml

Update the input spec file for arguments to the workflow

inputs:
    name: "adam"

Create execution using the exec spec file

$ uctl create execution -p flytesnacks -d development --execFile exec_spec.yaml

Monitor the execution by providing the execution id from create command

$ uctl get execution -p flytesnacks -d development <execid>

Deploying your code to Union.ai with union register

$ union register workflows --project basic-example --domain development

Here we are registering all the code in the workflows directory to the project basic-example in the domain development.

This command will:

  • Build the container image defined in your ImageSpec.
  • Package up your code and deploy it to the specified project and domain in Union.ai. The package will contain the code in the Python package located in the workflows directory. Note that the presence of the __init__.py file in this directory is necessary in order to make it a Python package.

The command will not run the workflow. You can run it from the Web interface.

This command is useful for deploying your full set of workflows to Union.ai for testing.

Fast registration

union register packages up your code through a mechanism called fast registration. Fast registration is useful when you already have a container image that’s hosted in your container registry of choice, and you change your workflow/task code without any changes in your system-level/Python dependencies. At a high level, fast registration:

  • Packages and zips up the directory/file that you specify as the argument to union register, along with any files in the root directory of your project. The result of this is a tarball that is packaged into a .tar.gz file, which also includes the serialized task (in protobuf format) and workflow specifications defined in your workflow code.

  • Registers the package to the specified cluster and uploads the tarball containing the user-defined code into the configured blob store (e.g. S3, GCS).

At workflow execution time, Union.ai knows to automatically inject the zipped up task/workflow code into the running container, thereby overriding the user-defined tasks/workflows that were originally baked into the image.

WORKDIR, PYTHONPATH, and PATH

When executing any of the above commands, the archive that gets creates is extracted wherever the WORKDIR is set. This can be handled directly via the WORKDIR directive in a Dockerfile, or specified via source_root if using ImageSpec. This is important for discovering code and executables via PATH or PYTHONPATH. A common pattern for making your Python packages fully discoverable is to have a top-level src folder, adding that to your PYTHONPATH, and making all your imports absolute. This avoids having to “install” your Python project in the image at any point e.g. via pip install -e.

Inspecting executions

Uctl supports inspecting execution by retrieving its details. For a deeper dive, refer to the API reference guide.

Monitor the execution by providing the execution id from create command which can be task or workflow execution.

$ uctl get execution -p flytesnacks -d development <execid>

For more details use --details flag which shows node executions along with task executions on them.

$ uctl get execution -p flytesnacks -d development <execid> --details

If you prefer to see yaml/json view for the details then change the output format using the -o flag.

$ uctl get execution -p flytesnacks -d development <execid> --details -o yaml

To see the results of the execution you can inspect the node closure outputUri in detailed yaml output.

"outputUri": "s3://my-s3-bucket/metadata/propeller/flytesnacks-development-<execid>/n0/data/0/outputs.pb"

Deploying your code to production

Package your code with union package

The combination of union package and uctl register is the standard way of deploying your code to production. This method is often used in scripts to build and deploy workflows in a CI/CD pipeline.

First, package your workflows:

$ union --pkgs workflows package

This will create a tar file called flyte-package.tgz of the Python package located in the workflows directory. Note that the presence of the __init__.py file in this directory is necessary in order to make it a Python package.

You can specify multiple workflow directories using the following command:

union --pkgs DIR1 --pkgs DIR2 package ...

This is useful in cases where you want to register two different projects that you maintain in a single place.

If you encounter a ModuleNotFoundError when packaging, use the –source option to include the correct source paths. For instance:

union --pkgs <dir1> package --source ./src -f

Register the package with uctl register

Once the code is packaged you register it using the uctl CLI:

$ uctl register files \
      --project basic-example
      --domain development \
      --archive flyte-package.tgz
      --version "$(git rev-parse HEAD)"

Let’s break down what each flag is doing here:

  • --project: The target Union.ai project.

  • --domain: The target domain. Usually one of development, staging, or production.

  • --archive: This argument allows you to pass in a package file, which in this case is the flyte-package.tgz produced earlier.

  • --version: This is a version string that can be any string, but we recommend using the Git SHA in general, especially in production use cases.

See Uctl CLI for more details.

Using union register versus union package + uctl register

As a rule of thumb, union register works well when you are working on a single cluster and iterating quickly on your task/workflow code.

On the other hand, union package and uctl register is appropriate if you are:

  • Working with multiple clusters, since it uses a portable package

  • Deploying workflows to a production context

  • Testing your workflows in your CI/CD infrastructure.

Programmatic Python API

You can also perform the equivalent of the three methods of registration using a UnionRemote object.

Image management and registration method

The ImageSpec construct available in union also has a mechanism to copy files into the image being built. Its behavior depends on the type of registration used:

  • If fast register is used, then it’s assumed that you don’t also want to copy source files into the built image.

  • If fast register is not used (which is the default for union package, or if union register --copy none is specified), then it’s assumed that you do want source files copied into the built image.

  • If your ImageSpec constructor specifies a source_root and the copy argument is set to something other than CopyFileDetection.NO_COPY, then files will be copied regardless of fast registration status.

Building your own images

While we recommend that you use ImageSpec and the union cloud image builder, you can, if you wish build and deploy your own images.

You can start with union init --template basic-template-dockerfile, the resulting template project includes a docker_build.sh script that you can use to build and tag a container according to the recommended practice:

$ ./docker_build.sh

By default, the docker_build.sh script:

  • Uses the PROJECT_NAME specified in the union command, which in this case is my_project.

  • Will not use any remote registry.

  • Uses the Git SHA to version your tasks and workflows.

You can override the default values with the following flags:

$ ./docker_build.sh -p PROJECT_NAME -r REGISTRY -v VERSION

For example, if you want to push your Docker image to Github’s container registry you can specify the -r ghcr.io flag.

The docker_build.sh script is purely for convenience; you can always roll your own way of building Docker containers.

Once you’ve built the image, you can push it to the specified registry. For example, if you’re using Github container registry, do the following:

$ docker login ghcr.io
$ docker push TAG

CI/CD with Flyte and GitHub Actions

You can use any of the commands we learned in this guide to register, execute, or test Union.ai workflows in your CI/CD process. Union.ai provides two GitHub actions that facilitate this:

  • flyte-setup-action: This action handles the installation of uctl in your action runner.

  • flyte-register-action: This action uses uctl register under the hood to handle registration of packages, for example, the .tgz archives that are created by union package.

Some CI/CD best practices

In the case where workflows are registered on each commit in your build pipelines, you can consider the following recommendations and approach:

  • Versioning Strategy : Determining the version of the build for different types of commits makes them consistent and identifiable. For commits on feature branches, use {branch-name}-{short-commit-hash} and for the ones on main branches, use main-{short-commit-hash}. Use version numbers for the released (tagged) versions.

  • Workflow Serialization and Registration : Workflows should be serialized and registered based on the versioning of the build and the container image. Depending on whether the build is for a feature branch or main, the registration domain should be adjusted accordingly.

  • Container Image Specification : When managing multiple images across tasks within a workflow, use the --image flag during registration to specify which image to use. This avoids hardcoding the image within the task definition, promoting reusability and flexibility in workflows.