Data input/output
Flyte being a data-aware orchestration platform, types play a vital role within it. This section provides an introduction to the wide range of data types that Flyte supports. These types serve a dual-purpose by not only validating the data but also enabling seamless transfer of data between local and cloud storage. They enable:
- Data lineage
- Memoization
- Auto parallelization
- Simplifying access to data
- Auto generated CLI and launch UI
For a more comprehensive understanding of how Flyte manages data, refer to Understand How Flyte Handles Data.
Mapping Python to Flyte types
Flytekit automatically translates most Python types into Flyte types. Here’s a breakdown of these mappings:
Python Type | Flyte Type | Conversion | Comment |
---|---|---|---|
int |
Integer |
Automatic | Use Python 3 type hints. |
float |
Float |
Automatic | Use Python 3 type hints. |
str |
String |
Automatic | Use Python 3 type hints. |
bool |
Boolean |
Automatic | Use Python 3 type hints. |
bytes /bytearray |
Binary |
Not Supported | You have the option to employ your own custom type transformer. |
complex |
NA | Not Supported | You have the option to employ your own custom type transformer. |
datetime.timedelta |
Duration |
Automatic | Use Python 3 type hints. |
datetime.datetime |
Datetime |
Automatic | Use Python 3 type hints. |
datetime.date |
Datetime |
Automatic | Use Python 3 type hints. |
typing.List[T] / list[T] |
Collection [T] |
Automatic | Use typing.List[T] or list[T] , where T can represent one of the other supported types listed in the table. |
typing.Iterator[T] |
Collection [T] |
Automatic | Use typing.Iterator[T] , where T can represent one of the other supported types listed in the table. |
File / file-like / os.PathLike |
FlyteFile |
Automatic | If you’re using file or os.PathLike objects, Flyte will default to the binary protocol for the file. When using FlyteFile["protocol"] , it is assumed that the file is in the specified protocol. |
Directory | FlyteDirectory |
Automatic | When using FlyteDirectory["protocol"] , it is assumed that all the files belong to the specified protocol. |
typing.Dict[str, V] / dict[str, V] |
Map[str, V] |
Automatic | Use typing.Dict[str, V] or dict[str, V] , where V can be one of the other supported types in the table, including a nested dictionary. |
dict |
JSON (struct.pb ) |
Automatic | Use dict . It’s assumed that the untyped dictionary can be converted to JSON. However, this may not always be possible and could result in a RuntimeError . |
@dataclass |
Struct |
Automatic | The class should be a pure value class annotated with the @dataclass decorator. |
np.ndarray |
File | Automatic | Use np.ndarray as a type hint. |
pandas.DataFrame |
Structured Dataset | Automatic | Use pandas.DataFrame as a type hint. Pandas column types aren’t preserved. |
pyspark.DataFrame |
Structured Dataset | Plugin Required | To utilize the type, install the flytekitplugins-spark plugin. Use pyspark.DataFrame as a type hint. |
pydantic.BaseModel |
Map |
Plugin Required | To utilize the type, install the flytekitplugins-pydantic plugin. Use pydantic.BaseModel as a type hint. |
torch.Tensor / torch.nn.Module |
File | Plugin Required | To utilize the type, install the torch library. Use torch.Tensor or torch.nn.Module as a type hint, and you can use their derived types. |
tf.keras.Model |
File | Plugin Required | To utilize the type, install the tensorflow library. Use tf.keras.Model and its derived types. |
sklearn.base.BaseEstimator |
File | Plugin Required | To utilize the type, install the scikit-learn library. Use sklearn.base.BaseEstimator and its derived types. |
User defined types | Any | Custom transformers | The FlytePickle transformer is the default option, but you can also define custom transformers. For instructions on building custom type transformers, please refer to this section. |