Configuring logging links in the UI

To debug your workflows in production, you want to access logs from your tasks as they run. These logs are different from the core Flyte platform logs, are specific to execution, and may vary from plugin to plugin; for example, Spark may have driver and executor logs.

Every organization potentially uses different log aggregators, making it hard to create a one-size-fits-all solution. Some examples of the log aggregators include cloud-hosted solutions like AWS CloudWatch, GCP Stackdriver, Splunk, Datadog, etc.

Flyte provides a simplified interface to configure your log provider. Flyte-sandbox ships with the Kubernetes dashboard to visualize the logs. This may not be safe for production, hence we recommend users explore other log aggregators.

How to configure?

To configure your log provider, the provider needs to support URL links that are shareable and can be templatized. The templating engine has access to these parameters.

The parameters can be used to generate a unique URL to the logs using a templated URI that pertain to a specific task. The templated URI has access to the following parameters:

Parameter Description
{{ .podName }} Gets the pod name as it shows in k8s dashboard
{{ .podUID }} The pod UID generated by the k8s at runtime
{{ .namespace }} K8s namespace where the pod runs
{{ .containerName }} The container name that generated the log
{{ .containerId }} The container id docker/crio generated at run time
{{ .logName }} A deployment specific name where to expect the logs to be
{{ .hostname }} The value used to override the hostname the pod uses internally within its own network namespace (i.e., the pod’s .spec.hostname)
{{ .nodeName }} The hostname of the node where the pod is running and logs reside (i.e., the pod’s .spec.nodeName)
{{ .podRFC3339StartTime }} The pod creation time (in RFC3339 format, e.g. “2021-01-01T02:07:14Z”, also conforming to ISO 8601)
{{ .podRFC3339FinishTime }} Don’t have a good mechanism for this yet, but approximating with time.Now for now
{{ .podUnixStartTime }} The pod creation time (in unix seconds, not millis)
{{ .podUnixFinishTime }} Don’t have a good mechanism for this yet, but approximating with time.Now for now

The parameterization engine uses Golangs native templating format and hence uses {{ }}. An example configuration can be seen as follows:

task_logs:
  plugins:
    logs:
      templates:
        - displayName: <name-to-show>
          templateUris:
            - "https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/flyte-production/kubernetes;stream=var.log.containers.{{.podName}}_{{.namespace}}_{{.containerName}}-{{.containerId}}.log"
            - "https://some-other-source/home?region=us-east-1#logEventViewer:group=/flyte-production/kubernetes;stream=var.log.containers.{{.podName}}_{{.namespace}}_{{.containerName}}-{{.containerId}}.log"
          messageFormat: 0 # this parameter is optional, but use 0 for "unknown", 1 for "csv", or 2 for "json"

Since helm chart uses the same templating syntax for args (like {{ }}), compiling the chart results in helm replacing Flyte log link templates as well. To avoid this, you can use escaped templating for Flyte logs in the helm chart. This ensures that Flyte log link templates remain in place during helm chart compilation. For example:

If your configuration looks like this:

https://someexample.com/app/podName={{ "{{" }} .podName {{ "}}" }}&containerName={{ .containerName }}

Helm chart will generate:

https://someexample.com/app/podName={{.podName}}&containerName={{.containerName}}

Flytepropeller pod would be created as:

https://someexample.com/app/podName=pname&containerName=cname

This code snippet will output two logs per task that use the log plugin. However, not all task types use the log plugin; for example, the Snowflake plugin will use a link to the Snowflake console.

By default, log links are shown once a task starts running and do not disappear when the task finishes. Certain log links might, however, be helpful when a task is still queued or initializing, for instance, to debug why a task might not be able to start. Other log links might not be valid anymore once the task terminates. You can configure the lifetime of log links in the following way:

task_logs:
  plugins:
    logs:
      templates:
        - displayName: <name-to-show>
          hideOnceFinished: true
          showWhilePending: true
          templateUris:
            - "https://..."

Dynamic log links are links which are 1. not shown by default for all tasks and 2. which can use template variables provided during task registration.

Configure dynamic log links in the flytepropeller configuration the following way:

configmap:
  task_logs:
    plugins:
      logs:
        dynamic-log-links:
        - log_link_a:  # Name of the dynamic log link
            displayName: Custom dynamic log link A
            templateUris: 'https://some-service.com/{{ .taskConfig.custom_param }}'

In flytekit, dynamic log links are activated and configured using a so-called ClassDecorator. You can define such a custom decorator for controlling dynamic log links for instance as follows:

from flytekit.core.utils import ClassDecorator


class configure_log_links(ClassDecorator):
    """
    Task function decorator to configure dynamic log links.
    """
    def __init__(
        self,
        task_function: Optional[Callable] = None,
        enable_log_link_a: Optional[bool] = False,
        custom_param: Optional[str] = None,
        **kwargs,
    ):
        """
        Configure dynamic log links for a task.

        Args:
            task_function (function, optional): The user function to be decorated. If the decorator is called
                with arguments, task_function will be None. If the decorator is called without arguments,
                task_function will be function to be decorated.
            enable_log_link_a (bool, optional): Activate dynamic log link `log_link_a` configured in the backend.
            custom_param (str, optional): Custom parameter for log link templates configured in the backend.
        """
        self.enable_log_link_a = enable_log_link_a
        self.custom_param = custom_param

        super().__init__(
            task_function,
            enable_log_link_a=enable_log_link_a,
            custom_param=custom_param,
            **kwargs,
        )

    def execute(self, *args, **kwargs):
        output = self.task_function(*args, **kwargs)
        return output

    def get_extra_config(self) -> dict[str, str]:
        """Return extra config for dynamic log links."""
        extra_config = {}

        log_link_types = []
        if self.enable_log_link_a:
            log_link_types.append("log_link_a")

        if self.custom_param:
            extra_config["custom_param"] = self.custom_param
        # Activate other dynamic log links as needed

        extra_config[self.LINK_TYPE_KEY] = ",".join(log_link_types)
        return extra_config


@task
@configure_log_links(
    enable_log_link_a=True,
    custom_param="test-value",
)
def my_task():
    ...

For inspiration, consider how the flytekit wandb, neptune or vscode plugins make use of dynamic log links.

Datadog integration

To send your Flyte workflow logs to Datadog, you can follow these steps:

  1. Enable collection of logs from containers and collection of logs using files. The precise configuration steps will vary depending on your specific setup.

For instance, if you’re using Helm, use the following config:

logs:
  enabled: true
  containerCollectAll: true
  containerCollectUsingFiles: true

If you’re using environment variables, use the following config:

DD_LOGS_ENABLED: "false"
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL: "true"
DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE: "true"
DD_CONTAINER_EXCLUDE_LOGS: "name:datadog-agent" # This is to avoid tracking logs produced by the datadog agent itself
  1. The Datadog guide includes a section on mounting volumes. It is essential (and a prerequisite for proper functioning) to map the volumes “logpodpath” and “logcontainerpath” as illustrated in the linked example. While the “pointerdir” volume is optional, it is recommended that you map it to prevent the loss of container logs during restarts or network issues (as stated in the guide).