All articles
Building Meaningful Datasets for AI Is a Change in Culture, Not Technology
CTO and Chief AI Officer Sean G Miller discusses how AI leaders are building meaningful data sets with historical context, and how this task is best understood as bringing technology back into an organization's business culture.

Key Points
Organizations hoard massive amounts of historical data, but without real-world context, that information fails to make them AI-ready.
Sean G Muller, Chief AI Officer for AverCare and Avatars Global, noted that AI projects stall primarily because teams ignore human context and treat data as a technical problem, rather than a business one.
He says that true governance requires business units to own their inputs, rather than "data experts" managing datasets without knowing what actually lives inside them.
AI projects don't fail because the technology is broken. They fail because businesses try to force it back into old processes rather than changing how work gets done.

Enterprises are discovering that data volume alone does not translate into intelligence. Years of sales records without context such as timing, external events, or product evolution offer limited value for prediction or AI-driven decision making. This gap between stored data and real-world meaning is pushing organizations to rethink how data is structured and governed, shifting away from static repositories toward federated, metadata-rich environments that preserve context alongside content.
For Sean G Muller, CTO for possibl.ai and Chief AI Officer for Avatars Global and AverCare, the enterprise fixation on historically perfect data is a massive distraction. According to Muller, AI projects rarely stall because of the technology. They stall because teams ignore human context. "AI projects don't fail because the technology is broken," he says. "They fail because businesses try to force it back into old processes rather than changing how work gets done."
This is the challenge that most platforms face: when AI starts hitting "corner cases," then it struggles to perform. Teams often build a strong proof of concept in a sandbox, only for lab models to break the second they hit real-world workflows. When handed a working model, business units frequently try to force the new tool into their original, undocumented daily habits, exposing algorithmic limitations at the edges of production workflows. Such friction is familiar to anyone who has navigated the drag of over-customized ERP migrations, where bespoke tables and scripts act as long-term anchors.
Twisty, gummy reality: Muller sees the focus on the problems with AI in niche situations as important, but not disqualifying or even historically unique. "You actually expose that brittleness within the technology, but all technology is basically brittle once you start trying to force it into some kind of twisty, gummy reality that is not along the lines of the process."
Cooking the spreadsheets: The critical part of supporting software like agentic AI is ensuring you have both the data and the context that make it meaningful. As Muller says, "You may have tons of data on your ERP process, and then there may be one little slice of data, like a reconciliation spreadsheet, that's all handled on a spreadsheet inside the CFO's office that's not part of your ERP process. That means everything you do with that data is worthless without that reconciliation piece."
A similar disconnect happens with historical records. Muller notes that, despite many organizations boasting 10 or 15 years of data, they rarely understand how changes in products, services, or locations fundamentally affect the meaning of that data.
Scrums and sales: Muller cites his work designing an AI strategy for a New Zealand grocery business. The company had transaction data recorded for SKUs, quantities, and temperatures, but lacked context indicators to help an AI make meaning with it. "While you can track the sale of beer or wine, context is important. For example, is there a rugby match going on? Without that information, you'll see those numbers without the context needed to make predictions in the future."
Muller points to Mark Zuckerberg’s recent comments about creators overestimating the value of individual pieces of content as a parallel idea. A single data point doesn't hold much value without the context in which it's used. Without that integration, deterministic expectations clash with probabilistic systems. LLMs rely on reinforcement learning from human feedback, a mathematical incentive structure that prioritizes user satisfaction over strict factual accuracy. When an organization's datasets lack clear contextual information to support interpretation, this tendency to provide satisfaction can highlight these gaps with confident-sounding yet ultimately incorrect data.
The root cause of these gaps usually points straight back to the organizational chart. As Muller says, "80% of it is cultural business maturity, and 20% of it is data readiness. You have executive leaders that are asking the data analytics team every week to run completely new reports and show them in new ways because the executives don't know what they're seeing."
Crossed communication lines: Leadership disconnects often trickle down into daily operations. Muller recalls a scenario in which a CIO and an IT GM looked at the exact same quarterly report and drew opposite conclusions about their technical debt. "I did some work with a CIO, and he wanted to do AI. I told him he had massive tech debt to resolve, and he told me he gets a report from the GM of IT that says it's under control. The GM of IT is in the room with us, and he responds incredulously that the reports he shares show they are underinvesting in tech debt every three months."
Titles without traction: Muller also tells the story of when he asked someone responsible for data labelling to obtain security approvals to share it, only to have that executive ask what was actually in the dataset. "I said, 'You're the data owner. You own the data. You have ultimate responsibility for the data. And you don't even know what's in the data?'" Muller says that instances like this show that businesses don't see the value in data itself, just the intelligence they get from it, and this creates a fundamental disconnect.
Because of that disconnect, Muller notes that IT departments shouldn't be the ultimate owners of data assets; they simply don't live the daily workflows. Instead, the "technology must be pushed back into the business." That is, instead of business leaders getting reports they can't fully understand, they can instead get reports on data integrity and accuracy and view that from a business perspective. Internal audit leaders such as Xin Tu point out that controls need to move upstream into day-one design, rather than being bolted on later.
That being said, it's crucial to collect data where it's created to make it meaningful in those business workflows. One example Muller provides is an insurance company that invested roughly $10 million and spent five years training sales agents to digitize handwritten client notes on tablets just to improve the richness of its data. In another case, a manufacturer deployed conversational avatars to interview retiring workers at the end of each shift to extract undocumented process knowledge before it disappears. Muller is crystal clear on this point: organizations that close the gap tend to push accountability back to the business units, making data a first-class part of process design. "Before you try to turn your data lake into some kind of monolithic know-everything about your business, know your business first. Look at your business processes, and the mapping of your business processes will tell you what data is useful, what data is not useful, and what you're missing."




