Host: 
Niels Bantilan
Location: 
Virtual

Intro to Data Validation with Pandera

Intro to Data Validation with Pandera

Join this workshop designed to help you get started with Pandera, a powerful Python library for data validation and quality assurance. Ensuring the integrity and reliability of your data is crucial in the world of data science and machine learning. Pandera makes this process seamless by providing tools to define, validate, and enforce data schemas directly in your workflows.

By the end of this workshop, you’ll have the skills to implement robust data validation checks that act like "unit tests" for your data, ensuring cleaner datasets and more trustworthy insights.

In this hands-on session, you’ll learn

  • Why Data Quality Matters: Understand common data quality challenges and their impact on data-driven applications.
  • Introduction to Pandera: Explore the core concepts of Pandera, including schema definitions, checks, and validation strategies.
  • Hands-on Demo: Build real-world data validation pipelines to catch errors early and improve the reliability of your datasets.
  • Integrating with Workflows: See how Pandera fits into machine learning pipelines, data engineering workflows, and MLOps environments.

Who should attend

  • Data Scientists and Analysts
  • Machine Learning Engineers
  • Data Engineers
  • Anyone working with data who wants to improve data quality and reduce errors

Prerequisites

Basic knowledge of Python and working with data using libraries like Pandas is recommended, but not required.

Let’s make data quality a first-class citizen in your projects with Pandera: github.com/unionai-oss/pandera

About the speaker

Niels is a machine learning engineer and core maintainer of Flyte, an open source ML orchestration tool and author and maintainer of Pandera, a data testing tool for dataframes. He has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems. He enjoys developing open source tools to make data science and machine learning practitioners more productive.

linkedin.com/in/nbantilan

About Union.ai

Union is an AI platform that simplifies ML infrastructure so you can develop, deploy, and innovate faster.Write your code in Python, collaborate across departments, and enjoy full reproducibility and auditability. Union lets you focus on what matters.

💬 Join our AI and MLOps Slack Community: slack.flyte.org

⭐ Check out Flyte on GitHub: github.com/flyteorg/flyte

🤝 Learn about everything else we’re doing at union.ai

Workshop