Sage Elliott

Fireside Chat with Leo Dirac: The Future of Computer Vision and Robotics

In this fireside chat, we talk with Groundlight co-founder and CTO Leo Dirac about the future of computer vision and robotics. 

Groundlight is a computer vision platform that enables you to interpret images using simple English natural language and minimal code. Groundlight combines traditional deep learning, expert human supervision, and real-time optimization to simplify the process of building robust vision solutions. 

This Union AI Fireside Chat Covers

  • Leo Dirac Introduction & Getting Started in Machine Learning
  • What is Computer Vision?
  • What is The Current State of Robotics?
  • How Does Reinforcement Learning Fit into Control Systems for Robotics?
  • Using Synthetic Data to Train Computer Vision Models and Robots
  • How Do Computer Vision and Robotics Work Together?
  • What is the Future of Computer Vision and Robotics?
  • How Groundlight Solves Computer Vision & Robotics

👋 Say hello to other AI and MLOps practitioners in the Flyte Community Slack.

⭐ Check out Flyte, the open-source ML orchestrator on GitHub.

AI Interview Transcript of ‘The Future of Computer Vision and Robotics’ 

(This transcript has been lightly edited for length and clarity.) 

Sage Elliott:
Welcome, everyone. My name is Sage Elliott. I'm going to be the host for today's Union fireside chat. At Union, we build tools around ML Orchestration and making data and AI pipelines more robust. You might know us from our open-source work on Flyte and Pandera.

You can check out everything we're doing at Union.ai. We just launched a serverless version of Union as well. So if you want to check that out, you can go sign up right away. But today's chat isn't about us. We have Leo Dirac from Groundlight with us. Welcome, Leo. How's it going?

Leo Dirac:
It's going great. Thanks for having me, Sage.

Sage Elliott:
Awesome. Thank you so much for coming on. We're going to hear about all the really cool stuff that you're working on. But the topic for today's fireside chat is “the future of computer vision and robotics.” And you're doing a lot of really cool stuff in that area right now. I think the headline, if I go to Groundlight AI, which everyone should go to and the link is in the description, I think the headline is “The Future of Computer Vision right now.”

Leo Dirac Introduction & Getting Started in Machine Learning

Sage Elliott:
Before we get going, could you provide an introduction and let us know a little bit about your background and what you're currently working on? I'm sure we'll dive way deeper into that later as well.

Leo Dirac:
Yeah, totally. So I kind of cut my teeth in machine learning and AI at Amazon, starting in 2013, where I was one of the first deep learning engineers in the company. I joined to launch the very first AWS machine learning service called Amazon Machine Learning. Nobody remembers it or uses it now—it was this simple linear regression service that we poured our hearts and souls into. But it was a little early and wasn’t positioned exactly right.

But I learned a ton about MLOps and what it takes to build real machine learning systems at scale and production quality. Then I hopped over to the retail side of Amazon, figuring that with all this cool deep learning neural network stuff, it would be easy to make $1 billion for the company by improving their recommender systems. And that was right. It actually took a few years to make $1 billion for the company—it’s kind of ridiculous, the scale that they operate at. But a fraction of a percent improvement in recommender systems means a lot over there.

After that, I went back to AWS to build SageMaker. I was one of the first engineers there and designed all of the AutoML and tuning stuff for them. I ended up leading the robotics and reinforcement learning parts of SageMaker. After six years at Amazon, I decided I was ready for something new. So I teamed up with an old college friend, and for the last few years, we’ve been building Groundlight, which is pretty exciting. And we’ll get into that.

Sage Elliott:
And just to tack on another question to that, what got you interested in machine learning first?

Leo Dirac:
I was working at a social media company, and we had teams of undergrads being paid $10 an hour to label tweets and Facebook posts as good or bad, positive sentiment or negative sentiment. And I watched this company grow, and it was clearly a useful service—lots of people wanted and needed this. I was thinking to myself, "Wow, this is nuts that there isn’t an algorithmic way to do this."

So I started doing some research, learning about machine learning and sentiment analysis. Natural language processing, I think at this point, is a pretty well-understood field, which does exactly that. And around that time, Andrew Ng launched his very first course on machine learning. I learned the basics and was like, "Wow, this is super cool!" This is everything I had spent my career learning about computer science, plus my background in physics and math, coming together to do amazing stuff.

I dove in, and pretty shortly after, neural networks had their big breakout moment when the ImageNet competition was first won by AlexNet. I remember watching the results of this and all the online discussion about how this grad student, Alex Krizhevsky, beat the pants off every single computer vision expert. He’d openly say, "I don’t know what to think about computer vision, but I trained this neural network, and it got 40% right, whereas you all are getting 15 or 16%." The reaction from the computer vision community was like, "This is nothing, it doesn’t mean anything, it’s a fluke. This is going away." But I was like, "What are you talking about? The world is changing right in front of us." So I got super excited and started buying GPUs and learning CUDA, whatever it took.

Sage Elliott:
I feel like most of the people I talk to have different backgrounds, but they often discover machine learning by thinking, "There’s a problem I’m having, and I’m sure there’s a better way to solve it." That’s similar to how I discovered computer vision. I had a hardware quality problem, and I was like, "I’m sure I can use computer vision to look at this," even though I didn’t know anything about it at the time. I learned enough to build a visual QA system, which was really cool.

What is Computer Vision?

Sage Elliott:
Our main topic today is the future of computer vision and robotics. But before we get into the future of those technologies, we can talk a bit about what they are first. Most of the people listening probably know what computer vision is, and some have probably built things with it themselves, but could you share your definition and examples when describing it? What is computer vision?

Leo Dirac:
It’s a great thing to talk about because AI is taking the world by storm, and a lot of people equate AI with generative AI, like stable diffusion, which can render images and movies, or ChatGPT, which can generate text and answer questions. All of this is general machine learning, and machine learning has had generative capabilities for a long time—it just sucked before transformers came along, to be perfectly honest.

For me, computer vision is the part of machine learning that deals with visual input. It’s any algorithm that takes an image or video and tries to understand it and make sense of it. This could be answering binary yes/no questions, classifying objects, counting items, or figuring out depth—like blurring the background in Zoom or Slack during a video call. That’s a computer vision algorithm. Generally, computer vision is the set of algorithms for analyzing images and video and making sense of them.

Sage Elliott:
You made a really good point that most people now are thinking of generative AI or deep neural networks, but traditional computer vision algorithms are still very useful. Not every problem requires deep learning.

Leo Dirac:
Exactly. Scanning a barcode or QR code is computer vision. These are well-trodden, incredibly reliable techniques, but they used to be hard. Now they’re pretty easy, but the best computer vision algorithms today are still pretty bad at some simple tasks.

What is The Current State of Robotics?

Sage Elliott:
Now, a lot of people have ideas of robots in their head, maybe from sci-fi. But the reality is different. What’s your definition of robotics, and could you share some examples?

Leo Dirac:
There’s this funny thing that happens with robotics. We all know the sci-fi version: humanoid robots walking around, doing human-like tasks. We’re nowhere near that yet, but we’re starting to see robots that take that shape. The reality is, the most common robot in the real world is the Roomba—a robot vacuum cleaner. People don’t call it a robot anymore because it works so well, but when it didn’t work too well, that’s when they called it a robot.

Most robots today are in factories—robotic arms that weld things together or pick and place objects. Another example is in e-commerce fulfillment centers, where robots take items from one place and put them into another. It sounds simple, but it’s actually very hard, and robots struggle with it if the environment is chaotic. So, the key observation is that robots today are not smart enough to handle chaotic environments, so we build the environment around the robot. One of my favorite examples is Ocado in the UK. They have these massive robot-operated grocery warehouses where the entire building is basically a giant robot that picks up food and bread and puts them into bags for people. The facility is built entirely to accommodate robots.

“The hardware has made huge strides, but the software still lags.”

Sage Elliott:
It’s fascinating how much infrastructure is built to accommodate robots. People might think of things like Boston Dynamics’ Spot or humanoid robots doing parkour, but it seems like there’s still a lot of human oversight required. Is it fair to say the hardware is there, but the software is the challenge?

Leo Dirac:
I agree with that. The hardware has made huge strides, but the software still lags. For example, robots like Spot can walk into a kitchen, open a fridge, pull out a can of soda, and walk back to hand it to you, but they still need supervision. They’re not smart enough to handle a chaotic environment like an office, where things are constantly changing. The level of constraints you'd need for a robot to operate effectively would make the environment unworkable for humans. So, the hardware is there, but the software—especially around perception and control—still needs improvement.

How do Deep Learning and Reinforcement Learning (RL) Fit into Control Systems and Perception for Robotics? 

Sage Elliott:
And I think you touched on this a little bit, but we have a good question from the audience. How do deep learning and reinforcement learning (RL) fit into control systems and perception for robotics? This is a hot topic right now, especially with AI advancements.

Leo Dirac:
Yeah, this is literally what I spent the last couple of years working on at AWS with the reinforcement learning system for SageMaker. So, reinforcement learning is this ability to not just predict a single result, but an entire path or series of results. For example, you can train a robot using RL to do things like stand up, walk, or even run.

There are some pretty cool videos of robots learning to crawl or stand up from scratch—just like babies. With reinforcement learning, you don’t need to tell the system exactly what to do; you just provide a reward signal, a hint that it’s getting better than before. It can then learn more complex behaviors, and it’s much more adaptive to weird, unexpected environments compared to traditional control techniques.

But what’s surprising is that most robotic systems don’t use RL for low-level control right now. People see things like Boston Dynamics’ parkour robots and assume it’s all RL, but it’s not. RL is still unreliable and hard to train. The vast majority of robots today still rely on traditional, well-understood control systems like PID loops—techniques from the 20th century. These are hard to implement but can be made reliable with enough effort.

In some cases, we’re starting to see systems where traditional control techniques are used for a while, and if they start failing, the system switches to RL. RL will likely play an increasing role in robotic control systems in the future, but it’s not as prevalent as you might think by just looking at the flashy videos of robots doing tricks.

Sage Elliott:
That’s really interesting! I think a lot of people assume that the cutting-edge demos we see today are all powered by AI, but it's much more complicated than that.

Using Synthetic Data to Train Computer Vision Models and Robots

Sage Elliott:
I wanted to bring up another good question from the chat. People are asking about synthetic data and how it's being used to train robots. We know it's used a lot in the autonomous vehicle industry, especially before real-world data was widely available. What’s your take on using synthetic data for training robotic systems?

Leo Dirac:
Yeah, synthetic data is definitely useful, and we use it behind the scenes in several cases at Groundlight. There’s a continuum when it comes to synthetic data. On one end, you can use something like stable diffusion to generate completely novel scenes and use them as training data. On the other end, you have techniques like simple data augmentation, where you jitter or modify existing data in small ways to make it “synthetic.” Both approaches are part of the broader category of synthetic data.

But here’s the challenge. In robotic systems, you encounter something called the “sim-to-real” problem. You can train a robot in a simulated environment to work incredibly well, but the real world is much messier. For instance, when I worked on AWS DeepRacer, we trained cars in simulation to race around a track. They performed great in the simulator, but when we put them on a physical track at an event like AWS re-invent they struggled because simulation can only approximate reality so much.

Roboticists refer to this as system identification—you need to accurately model every single aspect of the system in simulation, like the exact weight of the robot’s joints or the friction on each surface it interacts with. And those details are hard to estimate. There are techniques, like domain randomization, which involve intentionally adding variability to your simulation to account for this, but even then, it’s not a perfect solution. Synthetic data is a critical tool, but it doesn't solve all the problems. You have to be aware of the limitations.

Sage Elliott:
So, synthetic data can help with some use cases, but it brings its own set of challenges. It’s not a magic bullet, but more of a tool in the toolbox.

Leo Dirac:
Exactly. It’s a useful tool, but you can’t fully rely on it. Your simulation is never exactly the same as the real world. It helps you train models, but how well they perform depends on how closely your simulation matches reality.

How Do Computer Vision and Robotics Work Together?

Sage Elliott:
So, now that we’ve talked about robotics and computer vision, let’s talk about where the two come together. What are some of your favorite use cases where computer vision and robotics work well together today?

Leo Dirac:
One great example is mobile inspection. Companies like Boston Dynamics have robots that can walk around environments like oil rigs or warehouses, and they can inspect equipment for issues. We’ve integrated Groundlight’s system into Boston Dynamics’ Spot robot package. You can drive a robot around and use Groundlight to get high-quality answers to questions like, “Is the pressure gauge reading within safe limits?”

Another example is drones flying around and inspecting power lines or oil refineries—places where it’s hard to have humans walking around. People walking around with clipboards doing inspections are ripe for automation because robots are getting better at moving around and interacting with the world, even though they’re not perfect yet.

What is the Future of Computer Vision and Robotics?

Sage Elliott:
So, now that we’ve covered where things are today, what does the future of computer vision and robotics look like? Where do you see these technologies going in the next several years?

Leo Dirac:
That’s the big question. Everyone in the industry agrees that robots are going to coexist with us in our workplaces, homes, and streets at some point—it’s not a question of if, but when. Robots have been decades away for a long time, but now that gap is getting shorter because we’re seeing barriers come down.

The mechanical problems are mostly solved, although there’s still work to be done in areas like dexterous manipulation—the ability for a robot hand to pick up and manipulate objects is still really bad. Most robots use suction cups, which is the state of the art right now. But AI and deep learning give us a path to solving these software problems.

The challenge is moving AI into the physical world. If you ask an LLM like ChatGPT a question and it gets it wrong, the consequences are minimal. But if a robot makes a mistake, it could knock someone out. The future is going to involve heavy human oversight for a long time. At Groundlight, we’re building a platform that integrates human judgment with AI to handle these challenges. We know when to trust the models and when to ask for human input.

“The challenge is moving AI into the physical world.”

How Groundlight Solves Computer Vision & Robotics

Sage Elliott:
So, how does Groundlight help solve this problem?

Leo Dirac:
At Groundlight, we’ve built a computer vision platform that integrates human oversight with traditional computer vision algorithms. You can describe a task in natural language, like a binary question or object detection, and we build a model for you based on that description. If the AI isn’t sure about something, it asks a human for input, and that human feedback helps train the model in real-time.

Sage Elliott:
I know you recently launched a hardware product called the Groundlight Hub. Can you tell us more about what it does and how people are using it?

Leo Dirac:
The Groundlight Hub is aimed at commercial customers—people who aren’t necessarily developers but need solutions for real-world problems. You just buy the hub, plug it into your existing camera systems, and you can set up tasks or questions that you want the system to monitor.

For example, one use case is in restaurants. Imagine a restaurant owner wants to know if the dumpster behind their building is overflowing. Waste management companies often charge extra if the dumpster is too full, so this is a costly issue. Using the Groundlight Hub, the owner can set up a camera that monitors the dumpster. The system can send a text message whenever the dumpster is getting too full, helping them avoid extra fees.

Another use case is in fulfillment centers, where people often make mistakes packing the wrong items into boxes. A camera can snap a picture of the item before it goes into the box and compare it to the catalog image. This may sound simple, but it’s a tricky computer vision problem because the lighting in a warehouse is very different from a well-lit catalog image. The Groundlight Hub helps ensure that the right product is packed before shipping it off to a customer.

Sage Elliott:
So it sounds like the hub takes away some of the pain of setting up the cameras and doing all the computer vision work yourself. You can focus on the task at hand instead of troubleshooting OpenCV or configuring Raspberry Pis.

Leo Dirac:
Exactly. You just plug it in and get started. The hardest part is sometimes finding the password for your existing cameras, but once you’re connected, it works out of the box. You don’t have to spend hours setting up OpenCV or fiddling with hardware.

Sage Elliott:
Where can people go to try Groundlight?

Leo Dirac:
You can sign up for free at groundlight.ai, and there’s a free tier that’s good for hobby projects. Download the Python SDK, and we have tutorials to help you get started. If you have any questions, we have a chat on the site, and someone is usually available to help during business hours.

Sage Elliott:
This has been such a fantastic conversation. Thank you so much for coming on and sharing your insights. And thanks to everyone who joined and asked great questions in the chat. Be sure to connect with him if you have more questions. Thanks again, Leo!

Leo Dirac:
Thanks for having me. Stay tuned for some cool updates from us!

Computer Vision
Podcast
Machine Learning