Jupyter Notebook Tasks
Once you have a Union account, install union
:
pip install union
Export the following environment variable to build and push images to your own container registry:
# replace with your registry name
export IMAGE_SPEC_REGISTRY="<your-container-registry>"
Then run the following commands to run the workflow:
git clone https://github.com/unionai/unionai-examples
cd unionai-examples
union run --remote tutorials/sentiment_classifier/sentiment_classifier.py main --model distilbert-base-uncased
The source code for this tutorial can be found here {octicon}mark-github
.
import math
import pathlib
from flytekit import kwtypes, task, workflow
from flytekitplugins.papermill import NotebookTask
How to specify inputs and outputs
- After you are satisfied with the notebook, ensure that the first cell only has the input variables for the notebook. Now add the tag
parameters
for the first cell. - Typically at the last cell of the notebook (which does not need to be the last cell), add a tag
outputs
for the intended cell. - In a python file, create a new task at the
module
level. An example task is shown below:
nb = NotebookTask(
name="simple-nb",
notebook_path=str(pathlib.Path(__file__).parent.absolute() / "nb_simple.ipynb"),
render_deck=True,
enable_deck=True,
inputs=kwtypes(v=float),
outputs=kwtypes(square=float),
)
- Note the notebook_path. This is the absolute path to the actual notebook.
- Note the inputs and outputs. The variable names match the variable names in the jupyter notebook.
- You can see the notebook on Flyte deck if
render_deck
is set to true.
Other tasks
You can definitely declare other tasks and seamlessly work with notebook tasks. The example below shows how to declare a task that accepts the squared value from the notebook and provides a sqrt:
@task
def square_root_task(f: float) -> float:
return math.sqrt(f)
Now treat the notebook task as a regular task:
@workflow
def nb_to_python_wf(f: float = 3.1415926535) -> float:
out = nb(v=f)
return square_root_task(f=out.square)
And execute the task locally as well:
if __name__ == "__main__":
print(nb_to_python_wf(f=3.14))
Why Are There 3 Outputs?
On executing, you should see 3 outputs instead of the expected one, because this task generates 2 implicit outputs.
One of them is the executed notebook (captured) and a rendered (HTML) of the executed notebook. In this case they are called
nb-simple-out.ipynb
and nb-simple-out.html
, respectively.