article
4 reasons why AI pilots fail in CX and how to fix them

Here’s something worth sitting with: 88% of organizations are now using AI in at least one business function, and yet only 25% of AI initiatives are delivering the ROI they promised. Most brands aren’t struggling to get started. They’re struggling to launch and sustain what they’ve started.
That gap tends to get blamed on technology. Wrong model, wrong tool, wrong vendor. But in working with brands across financial services and beyond, the pattern we keep seeing at LivePerson is that the distance between a promising pilot and real production comes down to how we operationalize the work, not what we build it with.
There’s another layer to this. We’re collectively trying to apply old business cases and old success metrics to a technology that doesn’t behave like anything we’ve governed before. Generative AI requires a different kind of operating model, and most organizations haven’t built one yet. That shows up in four consistent ways.
1. Ownership without real accountability
The most familiar model goes like this: an enterprise buys a tool, hands it to IT or a specific team to figure out, and waits for productivity to improve. When results are anecdotal rather than quantifiable, it’s usually because no one defined what success looked like before they started.
McKinsey’s 2026 research found that CEO-level accountability for AI governance is one of the factors most strongly correlated with bottom-line impact. A sponsor at the top isn’t sufficient on its own. The organizations seeing real returns are building shared accountability across business, technology, risk, and operations, and making sure that everyone is actively involved rather than just endorsing the initiative from a distance.
Going beyond sponsorship also means creating the conditions for people to actually engage. That looks like dedicated time for workflow redesign, tailored upskilling across teams with different technical starting points, and incentives that make innovation worth pursuing. In practice, some of the most effective approaches are also the most straightforward: hackathon-style sessions where teams compete to solve the same problem, or bot-breaking exercises that get people hands-on with what’s actually been built.
One major bank in EMEA did exactly this before launching a new routing AI agent, inviting people from across the business to try to break it apart before it went live. That kind of participation builds accountability in a way that a mandate never does.
2. AI layered onto unchanged work
Deloitte’s 2026 data shows that only about a third of organizations are using AI to deeply transform their core processes. The majority are making partial adjustments or simply placing new tools on top of existing workflows, which tends to generate noise rather than value. You can prove that a model responds correctly and still see zero meaningful improvement in how work gets done.
The same approach used to map a customer journey end-to-end, identifying friction, redesigning touchpoints, and understanding where the technology actually touches people, needs to be applied internally to how employees work. That means using data to identify the highest-impact workflows rather than the most convenient ones, clearly defining where AI operates versus where humans stay in control, and measuring success through adoption and employee satisfaction alongside productivity metrics.
It also means being honest about what change management actually requires. Telling teams to use a new tool is not the same as helping them redesign the processes around it.
3. Integration deferred until it’s expensive
One in four organizations cites inadequate infrastructure and data as a barrier to ROI. In regulated industries like banking, where tech stacks are often heavily customized and difficult to connect with, that number feels conservative.
The temptation is to treat integrations as a phase two problem, something to solve once a pilot has proven its value. The reality is that retrofitting integration later almost always costs more in time, resources, and momentum than planning for it upfront would have. Teams shy away from the conversation because it’s a significant lift, but deferring it doesn’t make it smaller.
You don’t need to solve for every integration at the outset. Starting with a focused, high-value use case that doesn’t require heavy integration is a legitimate approach, as long as it’s building toward a defined end state rather than avoiding a harder decision. The planning for where you’re going needs to happen at the beginning, even if you’re not going there yet.
4. Evaluation built in too late
LangChain’s 2026 data identifies quality as the top barrier to production, and yet only about half of organizations are running offline evaluations before going live. Agentic AI is significantly harder to test than a single-turn prompt because it uses tools, multi-step reasoning, and integrations where errors compound across a workflow.
When teams wait until production to add tracing, human fallback mechanisms, and evaluation processes, the pilot stops functioning as a pilot and becomes a debugging exercise. The monitoring and evaluation plan needs to exist at the design phase, established before anything goes live, with clear policies for what gets reported, how often, and to whom.
Human oversight is not a concession to AI’s limitations. It’s how you create a system where people can approve or reject actions before they execute, intervene when conditions change, and audit for errors and bias before customers encounter them. Your human resources are your greatest safeguard against risk, and the organizations treating that as a design principle rather than a fallback are the ones scaling with confidence.
These four problems tend to compound each other, which is why pilots that look solid in a demo can quietly die before they ever reach production. The webinar goes deeper into each of them with practical frameworks for closing the gaps.


