The full-stack cost of AI

Every AI vendor's proposal includes a price. The price is, in almost every case, the smallest component of what the deployment will actually cost the entity. The CFO who funds the proposal on the headline number is buying a fraction of a system. The remaining cost will arrive later, often without the same procurement discipline, often without the same business case.

The full-stack cost model below is what a senior finance leader uses to bring an AI investment inside the same capital discipline that governs every other large commitment.

The seven cost layers below the vendor line

A working AI deployment has seven cost layers. Each layer has its own cost driver, its own scalability profile, and its own way of being underestimated. A business case that addresses fewer than all seven is incomplete.

Model access — the vendor invoice, per token or per seat.
Compute — the inference cost at scale, often non-linear with usage.
Data infrastructure — pipelines, vector stores, embedding workflows, monitoring.
Integration — connecting the model to the systems that produce the inputs and consume the outputs.
Governance — the human review layer, model risk management, audit support.
Model refresh — the ongoing cost of keeping the deployed model current with the business and the underlying model landscape.
Incident response — the cost of investigating and remediating when the model fails materially.

Compute: the cost that scales non-linearly

The compute cost of an AI deployment is rarely linear with usage. It is also rarely flat. A pilot serving twenty users at moderate prompt complexity may cost very little; the same architecture serving twenty thousand users with longer prompts and larger context windows may cost an order of magnitude more per user.

Three drivers cause the non-linearity:

Context length. Models charge approximately in proportion to the total tokens processed, and operations that load large documents into context can be the dominant cost.
Tool use and chaining. Agentic patterns that involve multiple model calls per user request multiply the cost per interaction.
Peak load. Inference compute, like any compute, scales to peak. A deployment that runs cheaply on average may require expensive headroom for peak demand.

The CFO should ask, for any AI investment case, what the cost looks like at expected load, at peak load, and at the load the deployment would experience if usage doubled unexpectedly. The answer is rarely in the original proposal. It should be in the funded business case.

Data infrastructure: the cost rarely in the proposal

An AI deployment is only as useful as the data it can access. The data infrastructure cost has identifiable components:

Pipelines that move data from the source systems to the model's reach, with the latency the use case requires.
Vector stores or equivalent for retrieval-augmented generation, with the storage and query cost they entail.
Embedding workflows that translate documents and structured data into the representations the model uses.
Monitoring and observability of the data layer, which is necessary for the audit trail and for incident response.
Data quality work — cleaning, deduplication, structured tagging — that is necessary precondition for the AI to produce useful output.

None of this is exotic infrastructure. It is the data engineering bill that has been growing for a decade. The AI deployment makes it both more visible and more expensive, because the model's appetite for clean, current data is larger than the previous use cases.

Integration: the cost the technology team underestimates

Connecting a model to the systems that produce its inputs and consume its outputs is, in most enterprise deployments, the largest single cost line below compute. The systems are heterogeneous, often legacy, often poorly documented. The interfaces have to be built, tested, monitored, and maintained.

The technology team's estimate of integration cost is usually low. The reasons are structural: the team estimates the cost of building the first version of the interface, not the cost of operating it; the cost of integrating with the systems they know well, not the systems they discover are involved; the cost in their own time, not the cost of the cross-team coordination the integration requires.

A defensible integration estimate includes the build, the test, the deployment, the documentation, the monitoring, the on-call rotation, and the future maintenance. The headline build cost is, in most cases, less than half the total.

Governance: the cost finance must demand be modelled

Every AI output that flows into a financial process — a forecast, a reconciliation, a variance explanation, a regulatory submission — must be reviewed by a human with the authority to accept or reject it. The review layer is not a marginal cost. It is a structural cost of operating the system within an audited finance function.

The cost has dimensions: the time spent reviewing outputs, the training of reviewers to do it well, the model risk management framework that supports the review, the documentation that the audit team will inspect. Built well, the review layer scales sublinearly with the volume of AI output. Built badly, it scales linearly and becomes the bottleneck the deployment was supposed to remove.

Model refresh: the cost that does not stop

The model that performs well today will not necessarily perform well in eighteen months. The business will have changed. The underlying model the vendor offers will have changed. The integration patterns the vendor supports will have evolved. The deployment that does not refresh becomes the legacy system, with the same depreciation curve as any other legacy system.

The CFO should plan for periodic model refresh — re-evaluation, re-tuning, in some cases re-architecting — as a recurring cost, not a one-time event. The cadence depends on the use case; the existence of the cost does not.

Incident response: the cost that is invisible until it isn't

An AI system that produces a materially wrong output in a financial context creates a workstream that includes investigation, remediation, communication, and possible disclosure. The cost of a single significant incident can exceed the total run-rate cost of the deployment for the period.

The CFO does not budget for the incidents that have not occurred. The CFO does budget for the response capability that will be invoked when one does — the playbook, the people, the tools, the audit trail that allows the incident to be explained.

What the full-stack model produces

A full-stack cost model for an AI investment is not a complicated spreadsheet. It is the same business case template the entity uses for any other capital decision, with the seven layers above explicitly modelled and the assumptions on each layer documented.

The output is twofold. First, a defensible total cost figure that the board can rely on. Second — and more useful — a model in which the cost drivers are visible and individually editable, so that when assumptions move, the impact is calculable.

The CFO who insists on the full-stack model will find that some AI investments that looked compelling on the vendor proposal look marginal at total cost, and some that looked marginal look compelling once the counterfactual is properly costed. That is the framework working as intended.

This piece sits inside the CFO in AI framework. See also the CFO as AI capital allocator and gross-margin maths in the age of AI. Lorna writes from practice at IMPT. The verified page records what is and isn't published here.

Lorna Mason is CFO of IMPT, Dublin. The verified public record is on the Verified page. Contact: lorna@impt.io

The seven cost layers below the vendor line

Compute: the cost that scales non-linearly

Data infrastructure: the cost rarely in the proposal

Integration: the cost the technology team underestimates

Governance: the cost finance must demand be modelled

Model refresh: the cost that does not stop

Incident response: the cost that is invisible until it isn't

What the full-stack model produces

Related from the CFO Blog.

The CFO as AI capital allocator

Gross-margin maths in the age of AI

Agentic FP&A and the continuous forecast