Host: 
David Espejo
Location: 
Virtual

Build Production-Ready AI: Efficient Batch Serving Pipelines

Build Production-Ready AI: Efficient Batch Serving Pipelines

A key stage in the MLOps process is about delivering your model to your users in a way that meets functional and business requirements. This stage is typically called Serving as it is when your model starts delivering predictions based on new/unseen data.

This workshop will equip you with the skills to understand some of the challenges of Serving for use cases like chatbots and code completion and showcase how you can use Flyte and model servers like vLLM to build a reproducible batch inference pipeline today.

This workshop will cover

  • Overview of the challenges of Serving AI models
  • Develop a model pipeline including fine-tuning a SOTA model
  • Integrate a model server into your pipeline
  • Serve your model in batch mode

What you’ll need to follow along

Who should attend

Anyone looking to productionize and deliver ML model breakthroughs to their users efficiently.

About the Speaker

David is a platform engineer and developer advocate who has been building or supporting cloud platforms and cloud-native applications for a long time and, more recently, bootstrapping efficient ML platforms on Kubernetes.

Connect with David on LinkedIn

About Union.ai

Our AI workflow and inference platform unifies data, models, and compute with the workflows of execution on a single pane of glass.

We also maintain Flyte, an open-source orchestrator that facilitates building production-grade data and ML pipelines.

💬 Join our AI and MLOps Slack Community: slack.flyte.org

⭐ Check out Flyte on GitHub: github.com/flyteorg/flyte

🤝 Learn about everything else we’re doing at union.ai

Workshop