Anomaly detection in finance has been with us for as long as auditors have sampled. What has changed is the cost of looking at every transaction rather than a statistical sample. A model that can score every line in real time, against learned patterns of the entity's behaviour, can identify candidates for review at a scale and speed that human sampling cannot match.
The capability is genuinely useful. It is also widely deployed badly. The most common failure mode is not that the model misses things. It is that the model flags so many things that the review layer collapses under the volume, and the function returns to sampling — but with a more expensive tool.
What anomaly detection is for, in a finance function
Real-time anomaly detection serves several purposes in a finance function.
- Fraud and error prevention. Identifying transactions that fall outside the entity's normal pattern, before they settle or before they enter the financial reporting.
- Control monitoring. Surfacing breakdowns in expected control behaviour — duplicate payments, missed approvals, unusual journal entries — at the time they occur.
- Operational insight. Identifying business pattern changes that the regular reporting cycle would not surface until the next period.
- Audit support. Providing a continuous record of the entity's monitoring activity, which the external auditor can rely on for parts of the substantive procedures.
Each purpose has different design implications. A system tuned for fraud prevention is calibrated for high recall — catch everything suspicious, even at the cost of false positives. A system tuned for operational insight is calibrated for precision — fewer alerts, more meaningful ones. Deploying a single system across all purposes without distinguishing the calibration produces a system that serves none well.
The signal-to-noise problem
The structural challenge with anomaly detection is that the rate of true anomalies is, in a well-controlled business, low. If 0.1% of transactions are genuinely problematic and the model has a 1% false-positive rate, the alerts the model produces are dominated by false positives by an order of magnitude. The review layer is then mostly looking at false positives, which is both expensive and demoralising.
The signal-to-noise problem is not solved by a better model alone. It is solved by:
- Tuning the model for the specific use case, with the calibration explicit.
- Defining the alert thresholds in line with the available review capacity, not in line with what the model can produce.
- Distinguishing alert severity, so that high-severity alerts get human attention immediately and lower-severity alerts are reviewed in batch or sampled.
- Closing the loop, so that every reviewed alert produces a label that retrains the model and improves the discrimination over time.
Calibration: thresholds, costs, and the discipline of tuning
The calibration of an anomaly detection system has two dimensions: the threshold at which the model raises an alert, and the cost the entity accepts for false positives and false negatives.
A system that flags 1% of transactions is reviewing 1% of activity. If the business is processing one million transactions a month, 1% is ten thousand alerts — likely more than the review function can address. The threshold must be tuned to the review capacity.
A higher threshold reduces the alert volume but increases the rate of true anomalies missed. The trade-off is unavoidable. The discipline is to make it deliberately, based on the relative cost of a missed anomaly and a reviewed false positive, rather than to leave it at the model's default.
The calibration is not a one-time decision. The business changes. The transaction mix changes. The model drifts as the data it was trained on becomes less representative of current behaviour. Periodic recalibration is required and should be scheduled, not reactive.
The human review layer
The model identifies candidates. The human decides whether the candidate is an anomaly or a business event. The distinction is critical and is usually the place where deployments succeed or fail.
A useful review layer has several characteristics.
- Clear ownership. Every alert has a named reviewer with the authority and the context to make the assessment.
- Appropriate context. The reviewer is presented with the alert, the underlying transaction, the relevant historical context, and the model's reasoning where the model can articulate it. Reviewers presented with only the alert struggle to assess it.
- Defined disposition options. Confirmed anomaly, false positive, business event, requires investigation. The disposition determines the downstream action and the training signal for the model.
- Service-level expectations. Time to first review, time to disposition. Alerts that age in a queue without action defeat the purpose of real-time detection.
- Feedback to the model. Dispositions feed back to the training data, so that the model improves over time at distinguishing real anomalies from business events.
The control implications
Real-time anomaly detection is a control. It is, increasingly, a control that external auditors expect to find in a well-controlled finance function. That has implications for how it is designed and documented.
The control framework should specify:
- The transactions in scope for monitoring.
- The model used, its validation status, and the calibration in effect.
- The review workflow, the ownership, and the service levels.
- The escalation path for confirmed anomalies — to the controller, to the CFO, to the audit committee where material.
- The retention of alert records and dispositions, sufficient for the auditor to test the control by sampling.
A real-time monitoring control that does not produce an audit trail is, for compliance purposes, no more useful than no control at all. The documentation discipline is part of the deployment, not an afterthought.
Where the technology fails, predictably
Several failure patterns are predictable enough that the CFO should plan for them.
Concept drift. The patterns the model learned are no longer the patterns the business produces. The model's performance degrades silently. Periodic re-validation catches this; assumption of stable performance does not.
Adversarial adaptation. If the anomaly detection is monitoring for fraud, fraud actors who become aware of the monitoring pattern will adapt. The monitoring must adapt in response, or it will detect only the unsophisticated cases.
Alert fatigue. When the review team has been seeing false positives for long enough, they begin to dismiss alerts by reflex. The genuine anomaly is then dismissed with the rest. Rotation of reviewers, periodic calibration, and regular conversion-rate review mitigate this; doing nothing accelerates it.
Model overreach. The model flags something it cannot explain, and the human reviewer overrides the flag without understanding why the model raised it. A confirmed anomaly is then dismissed because the model could not produce a story the reviewer could follow. Better explanation in the model output, and training in how to handle low-interpretability flags, both help.
None of these failure patterns are reasons not to deploy anomaly detection. They are reasons to deploy it deliberately, with the governance, the calibration, and the review capacity to match.
This piece sits inside the CFO in AI framework. See also AI in the financial close and AI governance for the finance function. Lorna writes from practice at IMPT. The verified page records what is and isn't published here.
Lorna Mason is CFO of IMPT, Dublin. The verified public record is on the Verified page. Contact: lorna@impt.io