Data input/output
Flyte being a data-aware orchestration platform, types play a vital role within it. This section provides an introduction to the wide range of data types that Flyte supports. These types serve a dual-purpose by not only validating the data but also enabling seamless transfer of data between local and cloud storage. They enable:
- Data lineage
- Memoization
- Auto parallelization
- Simplifying access to data
- Auto generated CLI and launch UI
For a more comprehensive understanding of how Flyte manages data, refer to Understand How Flyte Handles Data.
Mapping Python to Flyte types
Flytekit automatically translates most Python types into Flyte types. Here’s a breakdown of these mappings:
Python Type | Flyte Type | Conversion | Comment |
---|---|---|---|
int |
Integer |
Automatic | Use Python 3 type hints. |
float |
Float |
Automatic | Use Python 3 type hints. |
str |
String |
Automatic | Use Python 3 type hints. |
bool |
Boolean |
Automatic | Use Python 3 type hints. |
bytes /bytearray |
Binary |
Not Supported | You have the option to employ your own custom typetransformer. |
complex |
NA | Not Supported | You have the option to employ your own custom type transformer. |
datetime.timedelta |
Duration |
Automatic | Use Python 3 type hints. |
datetime.datetime |
Datetime |
Automatic | Use Python 3 type hints. |
datetime.date |
Datetime |
Automatic | Use Python 3 type hints. |
typing.List[T] / list[T] |
Collection [T] |
Automatic | Use typing.List[T] or list[T] , where T canrepresent one of the other supported types listed in the table. |
typing.Iterator[T] |
Collection [T] |
Automatic | Use typing.Iterator[T] , where T can represent one of the other supported types listed in the table. |
File / file-like / os.PathLike |
FlyteFile |
Automatic | If you’re using file or os.PathLike objects,Flyte will default to the binary protocol for the file. When using FlyteFile["protocol"] , it is assumedthat the file is in the specified protocol, such as ‘jpg’, ‘png’, ‘hdf5’, etc. |
Directory | FlyteDirectory |
Automatic | When using FlyteDirectory["protocol"] , it is assumed that all thefiles belong to the specified protocol. |
typing.Dict[str, V] / dict[str, V] |
Map[str, V] |
Automatic | Use typing.Dict[str, V] or dict[str, V , where V can be one of the other supported types in the table, including a nested dictionary. |
dict |
JSON (struct.pb ) |
Automatic | Use dict . It’s assumed that the untyped dictionary can beconverted to JSON. However, this may not always be possible and could result in a RuntimeError . |
@dataclass |
Struct |
Automatic | The class should be a pure value class annotated with the @dataclass decorator. |
np.ndarray |
File | Automatic | Use np.ndarray as a type hint. |
pandas.DataFrame |
Structured Dataset | Automatic | Use pandas.DataFrame as a type hint. Pandas columntypes aren’t preserved. |
polars.DataFrame |
Structured Dataset | Automatic | Use polars.DataFrame as a type hint. Polars columntypes aren’t preserved. |
polars.LazyFrame |
Structured Dataset | Automatic | Use polars.LazyFrame as a type hint. Polars columntypes aren’t preserved. |
pyspark.DataFrame |
Structured Dataset | To utilize the type, install the flytekitplugins-spark plugin. |
Use pyspark.DataFrame as a type hint. |
pydantic.BaseModel |
Map |
To utilize the type, install the pydantic module. |
Use pydantic.BaseModel as a type hint. |
torch.Tensor / torch.nn.Module |
File | To utilize the type, install the torch library. |
Use torchTensor or torch.nn.Module as a type hint, and you can use their derived types. |
tf.keras.Model |
File | To utilize the type, install the tensorflow library. |
Use tf.keras.Model andits derived types. |
sklearn.base.BaseEstimator |
File | To utilize the type, install the scikit-learn library. |
Use sklearnbase.BaseEstimator and its derived types. |
User defined types | Any | Custom transformers | The FlytePickle transformer is the default option, but youcan also define custom transformers. For instructions on building custom type transformers, please refer to
this section. |