Financial services institutions are making unprecedented investments in AI, yet only 31% can actually track returns on that spending. According to a 2025 Harris Poll of 506 financial services executives, 78% express high confidence in future AI results without any reliable measurement methods to back it up.
Most organizations cannot distinguish between AI initiatives that are genuinely underperforming and those that are succeeding but failing to meet unrealistic expectations.
The gap no one is talking about
Only 12% of financial institutions have successfully implemented enterprise-wide AI strategies. The remaining 88% are cycling through pilots with no clear path to scale.
What separates the two groups is not budget or board support but execution discipline: robust data governance, end-to-end platform engineering, and measurement capabilities that connect AI activity to actual business outcomes.
Each of these traits works differently at the ground level. Data governance in financial services goes beyond access controls and into lineage: knowing where training data came from, how it has been transformed, and whether it can be reproduced for an auditor six months later. End-to-end platform engineering means AI agents operate on a consistent surface across planning, development, security review, and deployment, rather than context-switching between tools that do not share state. Measurement capability is the discipline of connecting AI activity to the outcomes leadership actually cares about, such as cycle time, defect escape rate, and time to remediate a vulnerability, not the usage metrics vendors ship by default.
When organizations cannot measure what is working, they cannot make informed decisions about where to invest next. AI budgets grow, timelines slip, and confidence remains high while results stay elusive.
Why forecasts fail
Most AI revenue projections in financial services are built on vendor case studies, best-case pilot results, and competitive pressure. They assume optimal execution, static markets, and customer adoption that moves on the institution's timeline.
None of those assumptions hold at scale. And in a regulated environment, the gap between a controlled proof of concept and enterprise-wide deployment is wider than most strategic plans account for.
Here’s a common pattern: A firm runs a successful proof of concept in a sandboxed environment and projects enterprise-wide results based on those numbers. What the pilot never surfaced was the integration complexity with production systems, the change approval cycles required in a regulated environment, and the staff adoption curves that vary significantly across business lines. By the time those variables compound, the original projection bears little resemblance to the actual implementation timeline or cost.
The institutions that have broken through treat AI investment the way they treat any other major capital allocation decision: with clear success criteria, realistic timelines, and accountability frameworks that hold up to scrutiny.
What measurement actually looks like
Most organizations are measuring AI adoption. Very few are measuring AI impact. Those are not the same thing.
Seats licensed, prompts sent, and suggestions accepted tell you whether people are using the tools. They do not tell you whether the investment is moving the business. The gap between those two things is where most AI ROI narratives fall apart.
The metrics that matter to a CIO or CISO sit one layer deeper: cycle time from commit to production, defect escape rate, time to remediate a critical vulnerability, deployment frequency. Tracked consistently over several quarters, these connect AI activity to the outcomes leadership actually cares about. None of them require new instrumentation. They require the discipline to prioritize them over the usage metrics that are easier to report.
Measurement as regulatory posture
For financial institutions, measurement discipline is also becoming a regulatory expectation. Supervisory frameworks have signaled that AI systems operating in regulated workflows need explainable governance, documented performance over time, and evidence that human accountability has been preserved. Institutions that cannot produce those artifacts on request are exposed, regardless of how well their models are performing.
For institutions operating under frameworks like DORA, SR 11-7, or the EU AI Act, the ability to document AI performance over time, demonstrate explainable governance, and evidence human accountability is both a competitive advantage and a supervisory expectation. Organizations that build that capability proactively are better positioned for the next board presentation and the next regulatory examination.
The foundation that makes scaling possible
Measurement discipline and governance frameworks matter, but they need something to operate on. In a regulated financial institution, where development, security review, compliance sign-off, and deployment touch different teams across different systems, the orchestration layer is what determines whether AI generates isolated productivity gains or institution-wide value.
Bringing human teams and AI agents together across the entire software development lifecycle is what converts AI investment into measurable AI value. Successfully scaling AI requires intelligent orchestration that spans DevOps, security, and compliance workflows on a unified platform. Point solutions stitched together after the fact cannot deliver it.
The $750 billion opportunity AI represents in financial services is real. Realizing it requires something the industry already knows how to do: measure what matters.
Next steps
Scaling AI investments in financial services: The framework for measuring ROI
Learn how leading financial institutions are moving from AI experimentation to measurable, scalable results.
Frequently asked questions
Key takeaways
- Most AI revenue forecasts are built on ambition, not analysis, and the consequences compound over time.
- The institutions that have broken through share three traits: robust data governance, end-to-end platform engineering, and systematic measurement capabilities.
- Scaling AI requires intelligent orchestration across the entire software lifecycle, not point solutions stitched together after the fact.

