Data classification and residency
Data classification
Every data type in the Union.ai platform is classified by its residency and access pattern. This classification determines where data is stored and how it is accessed.
| Classification | Data types | At rest | In transit | Enters control plane? |
|---|---|---|---|---|
| Bulk Customer Data | Files, directories, DataFrames, code bundles, container images, reports | Customer infrastructure (S3 SSE / GCS / Azure SSE) | HTTPS via presigned URL | No: served via signed URL direct to/from object store |
| Inline Customer Data | Structured task inputs/outputs, secret values (during creation), execution log streams | Customer infrastructure (S3 SSE / GCS / Azure SSE; cloud secret managers) | Direct-to-DataPlane tunnel (default tier) or customer-managed internal LB (Sovereign Data Plane tier) | No: served from data plane |
| Orchestration Metadata | Task definitions (including env vars, default values, SQL, pod specs), run/action state, error messages, trigger specs | Control plane databases (AES-256/KMS) | TLS (API) + TLS (gRPC events) | Yes: read from DB into memory for API responses |
| Platform Metadata | User identity/RBAC records, cluster records | Control plane databases (AES-256/KMS) | TLS (API) | Yes: read from DB into memory for API responses |
In this section “metadata” means control-plane database records — task definitions, run state, identity records, and similar. Some deployment guides and Helm-chart configuration refer to a “metadata bucket” or metadataContainer; that is a legacy term for the data-plane object-store bucket, which holds task inputs, outputs, Decks, checkpoints, and code bundles. It does not hold any of the orchestration or platform metadata listed in the table above. For the developer-facing map of which is which, see
Where your data lives.
Bulk customer data (files, directories, DataFrames, code bundles, container images, and reports) is stored exclusively in the customer’s infrastructure and never enters the control plane. These objects are accessed via presigned URLs issued by the data-plane dataproxy.
Inline customer data (structured task inputs and outputs, secret values during creation/update, and execution log streams) is stored at rest in the customer’s infrastructure and served directly from the data plane through the Direct-to-DataPlane tunnel. No customer data enters Union.ai’s control plane in any form – not even transiently in memory.
Orchestration metadata is stored in the control plane databases (encrypted at rest). This includes task definitions, which contain structural information (container image references, typed interfaces) and fields that may be customer-sensitive: environment variables, default input literal values, SQL query statements, Kubernetes pod specs, plugin configuration, and config key-value pairs. Error messages from task executions (which may contain data from Python tracebacks) are also stored. A full task definition (TaskSpec) is stored on every run submission.
Data residency
All customer data resides in the customer’s own cloud account and region. The customer chooses the region for their data plane deployment, and all data plane resources (object storage, container registry, secrets backend, log aggregator, and compute) are provisioned within that region.
The control plane is available in the following regions: US West (us-west-2), US East (us-east-2), EU West-1 (Ireland), EU West-2 (London), and EU Central (eu-central-1). No customer data of any kind is replicated to, cached in, or routed through Union.ai infrastructure – inline data is served from the customer’s data plane through the Direct-to-DataPlane tunnel and never traverses the control plane region. Only orchestration metadata (run IDs, schedules, phase transitions, task definitions, error messages, and the RBAC graph) resides in the control plane region.
Customers whose data-residency policy extends to orchestration metadata (e.g., GDPR, or jurisdiction-specific policies that cover operational records about workflows in addition to the workflow data itself) should select a control plane region consistent with that policy. For EU-deployed data planes using an EU control plane region, all data and metadata (both at rest and in transit) stay within the EU, supporting GDPR data residency requirements.
For details on the architectural separation that enforces these residency guarantees, see Two-plane separation.
Verification
Data classification
Reviewer focus: Confirm that each data type resides where the classification table claims. Verify that bulk data is in the customer’s infrastructure, and that task definitions in the control plane contain only expected fields.
How to verify:
The task definition schema is derived from the open-source Flyte protobuf definitions in the
flyte-sdk repository. Review the TaskTemplate and RunSpec protobuf schemas and compare them to the field enumeration in the classification table above to confirm that the fields stored match the documented classifications.
Then run a workflow with recognizable data (e.g., a known string or file), and verify the location of each data type:
-
Inputs/outputs: confirm they are in the customer’s object store:
aws s3 ls s3://<customer-bucket>/org/project/domain/run-name/action-name/ -
Code bundle: confirm it is in the customer’s object store:
aws s3 ls s3://<customer-bucket>/org/project/domain/code-bundles/ -
Container image: confirm it is in the customer’s container registry:
aws ecr describe-images --repository-name <repo> --region <region> -
Logs: confirm they are in the customer’s log aggregator:
aws logs get-log-events --log-group-name <group> --log-stream-name <stream> -
Secrets: confirm they are in the customer’s secrets backend:
aws secretsmanager list-secrets --region <region> -
Task definition: confirm it contains the expected fields, stored in the control plane:
uctl get task <task-name> -o jsonThe response will contain resource requirements, typed interfaces, container image references, and potentially sensitive fields (environment variables, default values, etc.) as documented in Control plane. Bulk data content should not appear inline.
-
Run metadata: confirm it contains metadata and URI references, stored in the control plane:
uctl get execution <execution-id> -o jsonThe response should contain phase, timestamps, URIs, error messages, and task definition fields. Bulk data content should not appear inline.
Data residency
Reviewer focus: Confirm that all data plane resources reside in the customer’s chosen region and that no customer data is stored outside that region.
How to verify:
-
Confirm the object store region:
aws s3api get-bucket-location --bucket <customer-bucket>The output should match the customer’s chosen deployment region.
-
Verify all data plane resources in the cloud console. Compute, storage, registry, secrets, and log aggregator should all be in the same region.
-
Confirm the cluster region via the Union.ai API:
uctl get clusterThe cluster region should match the customer’s chosen deployment region.