All articles

AI

Physical AI Needs Better Signals, Not Bigger Data Pipelines

The Data Wire - News Team

|

May 20, 2026

Andrew Ballard, Enterprise Data and Strategy specialist at Thoughtworks, draws a line between useful AI training data and unnecessary noise to prioritize useful data pipelines.

Credit: The Data Wire

If your system depends on storing everything, your infrastructure becomes the bottleneck. The breakthrough is realizing you can reduce the data to kilobytes of meaning rather than terabytes of pixels and still make better decisions.

Andrew Ballard

Enterprise Data and Strategy, APAC
Thoughtworks

Physical AI doesn’t get smarter by treating every pixel as equally valuable. At the edge, that brute-force approach can turn AI training into an infrastructure problem: heavier pipelines, higher storage costs, and more compute spent sorting noise from signal. For robotics and image-processing teams, the real challenge is deciding what deserves attention in the first place. Push more data into the pipe, and the system risks over-provisioning. Filter too aggressively, and it may miss the one signal that matters.

Andrew Ballard navigates that exact tension from both the enterprise and startup sides of AI. He leads Enterprise Data & AI Strategy across APAC at Thoughtworks and is the Founder of SpatioTemporal, a startup building a motion-intelligence layer for autonomous systems like self-driving cars and AI robotics. That dual vantage point led him to focus on intent-aware modeling at the edge. By tokenizing human motion into roughly 10,000 kinematic primitives, his models recently cut near-collision events by about 90% in simulations on NVIDIA’s Cosmos platform. He's clear that trying to get every pixel as an AI signal is just counterproductive.

"If your system depends on storing everything, your infrastructure becomes the bottleneck," he says. "The breakthrough is realizing you can reduce the data to kilobytes of meaning rather than terabytes of pixels and still make better decisions."

The case against pixel-heavy perception

For some robotics and spatial computing teams, relying entirely on raw pixels creates a massive hardware bottleneck. Trying to extract a physics-based understanding of the world from continuous video streams demands heavy processing and storage. Ballard bypasses the bloat by moving from raster images to mathematical vectors. By combining object recognition pipelines like YOLO with ego-motion, his system produces a lightweight "world state vector" that simplifies inference and can drastically reduce data volumes, helping cut cloud costs.

"I'm charting a much simpler story," Ballard says. "Almost everyone else is thinking about how to get data from a single still in a video. I'm looking at the massive objects, the things that are moving. It radically simplifies the input, inference, training, compute, and storage to the tune of five or six orders of magnitude less than handling video."

He compares the shift to a change in graphic formats. "A billboard, if it were done in pixels, would be massive. But the very same thing described in math can be a million times smaller and crystal clear, right up to in front of your nose."

Reframing data quality as prioritization

Ballard rejects the notion that usable truth in AI models comes from clean, exhaustive capture. Instead, he treats it as a question of segmentation. Mission-critical signals get real-time processing, secondary packets get queued, and the long tail gets reviewed overnight or flagged only when something breaks.

"You can pick and choose what to focus on," he says. "It's not about getting everything in real time. You can probably set 30% as the critical, and the rest as packets. I can look at it overnight, or someone will tell me if there's a problem. So segment your thinking into mission-critical, then tier two, and then tier three. That really makes a difference."

That prioritization logic extends to model training itself. Ballard pushes back on the data-science instinct to throw every piece of data at a problem in the name of training. "If you're just going to burn 100 more cycles to prove that the last 200 columns did absolutely nothing, at some point in time, you're just burning cycles."

Disciplined ignorance, learned from the road

Ballard's motorcycle commute originally sparked the idea. Rather than treating perception as a monolithic vision problem, he wants to give machines the same fast, low-level reflexes humans rely on in traffic. He calls it the "missing middle" for physical AI, a brainstem for robots that formalizes the subconscious ways humans filter visual noise. That is, Ballard wants to recreate human subconscious prioritization, incorporating and abstracting background noise so the AI can focus on priority processes. 

"When you're driving down the road, the leaves on the trees don't really matter," he says. "I'm on a motorbike a lot. I'm my own crumple zone. I don't worry about the road signs, the paraphernalia. I just worry about that two-ton truck next to me that's going to kill me."

That attention compression, Ballard argues, is how humans thrive and multitask. It is the architecture that physical AI should be borrowing from.

The missing layer: intent, not just position

Certain self-driving interfaces are highly proficient at modeling 3D physics. Yet interpreting the nuances of human behavior remains a stubborn hurdle. Many current systems represent humans primarily as moving obstacles, logging their physical coordinates without capturing intent. Ballard designed his kinematic primitives to add the missing behavioral layer, distinguishing a rushing businessperson from a distracted pedestrian.

"I put it to you right now that the Tesla-level interface tracks all of those in beautiful 3D space on the 17- or 15-inch screen," Ballard notes. "Yet all the humans are the same shade of gray. But I can go a step further. I can say that the jogger is competent, the pedestrian is distracted, or the businessperson is rushing. Because I know how they're moving and behaving, and I can add some 'color' to those objects."

The practical difference, he says, shows up at intersections. "A Tesla car just identifies a human as a grey puppet," he says. "I'd like to get to a stage where my models can say 'those three humans on the sidewalk; two of them have acknowledged that I'm turning, but the third one hasn't yet either slowed down or turned to see me. I'd better yield to them.'"

Vectorized infrastructure for physical AI

That specific architecture also changes how Ballard approaches storage and data lifecycles. To navigate AI production challenges, he often recommends his three-tier system: high-priority streams, secondary packets, and a long tail of data for overnight review.

In his own startup, he uses a kilobyte-per-second data frame to describe salient motion, only capturing raw video when an anomaly triggers it. The structure reduces pressure on storage hardware and makes automated data governance much easier to manage. As a proof of concept for what comes next in edge infrastructure, he runs his models directly on a smartphone, transmitting tiny data packets to the cloud rather than a continuous video feed.

"If that's long temporal, like a four-hour drive, you don't want to store that in bad video because what's the use? You don't want to store that in 4K video because you're going to take a storage hit," he explains. Instead, Ballard’s approach stores a lightweight mathematical reconstruction of the scene, preserving the signals that matter without carrying the full weight of raw video.

Trust as the final infrastructure problem

Engineers often find that building a statistically safe machine is entirely different from getting consumers to trust it. For Ballard, the work on motion and intent ultimately points toward that human acceptance. The next phase of robotics relies heavily on bridging the gap between machine physics and human intuition.

Ballard bets that getting there starts not with capturing more, but with capturing less and differently. "A lot of this is stuff that we just didn't realize we already knew," he says. "This is turning it from intuition and reflexes in the human world into what that would look like for a robot world."

Related Stories