AI in Controlling and Finance 2026: what mid-market CFOs can actually automate

Controlling teams still spend 60 to 70 percent of their time on manual aggregation in 2026. 7 use-cases, 3 anti-patterns, and a clear line between autonomous and human-approved.

In 2026, controlling is the function with the highest AI leverage in the company and the smallest tolerance for error. 60 to 70 percent of manual aggregation work could in principle be automated, but a single hallucinated number can derail a quarterly report or a banking conversation. Here is the path that resolves both: 7 use-cases, 3 anti-patterns, and a clear line between automating and approving.

Why finance is the trickiest AI area in the company

Across roughly 40 Sentient Dynamics workshops with DACH mid-market companies (machinery, IT services, B2B SaaS, industrial services), the pattern is remarkably stable: controlling teams spend 60 to 70 percent of their working time on data aggregation, consolidation, and report preparation. Pulling tables from ERP, merging into Excel, reconciling DATEV exports, formatting reports, building year-over-year views. Only after all that does the actual CFO work begin: judgment, interpretation, recommendation.

These 60 to 70 percent are not a study quote, they are a workshop aggregate from our own observations. The order of magnitude lines up with what the Bitkom AI study 2025 and the McKinsey State of AI (November 2025) describe in their broader picture: finance functions show high automation potential, low adoption depth, high perceived risk. That mix is exactly why CFOs hesitate. The leverage is obvious, the trust is not.

And rightly so. Finance does not tolerate hallucinations. A typo in a marketing piece gets corrected. A wrong number in a quarterly report becomes a compliance problem, a lost banking meeting, or an unhappy investor. The safety margin in finance is larger, and that has to be designed into every use-case.

The good news: the real leverage is not in "AI does the math" but in "AI fetches, sorts, checks, prepares". Aggregation and preparation can be automated far in 2026. Judgment and publication stay with humans. Draw that line cleanly, and you recover most of the 60 to 70 percent without giving up trust.

What actually runs in production in 2026: 7 use-cases

These seven use-cases have emerged from the workshops as the points where AI in controlling and finance creates real leverage. They follow the typical monthly rhythm of a mid-market CFO, from data gathering to stakeholder communication.

1. Data aggregation from ERP, Excel, and DATEV

The first use-case is the big one: the recurring consolidation of data from multiple sources. A typical Mittelstand company has an ERP (Sage, SAP, Microsoft Dynamics, Infor), a DATEV connection for bookkeeping, one to three pre-systems (inventory, time tracking, project controlling), and a collection of legacy Excel files. Monthly consolidation takes many teams two to four person-days, often via copy-paste, manual formula tweaks, and a nervous final check.

An AI agent can orchestrate that pipeline: it triggers the exports, checks completeness (are all accounts present, are all cost centers populated?), merges the data into the consolidation worksheet, flags gaps, and produces the result sheet. Important: the actual aggregation runs deterministically (SQL, Python, Excel formula), not "the AI calculates". The agent only steers the flow and checks plausibility. That removes the hallucination risk because the numbers come from rule-based steps. A typical workshop outcome: what used to take three days drops to half a day with a clean pipeline, and the controller spends time on analysis instead of clicking things together.

2. Report preparation

Once the data is ready, the second step kicks in: building standard reports, year-over-year comparisons, KPI commentary. An AI system can produce the first draft: populate tables, prepare charts, flag outliers, propose commentary ("Q1 revenue plus 4.2 percent vs prior year, margin minus 1.1 percentage points, driver: steel input cost plus 7.8 percent"). The controller reads, corrects, adds their own observations. What was a full report-building day becomes half a day of content judgment.

Again, important: the numbers in the report do not come from the LLM, they come from the aggregation worksheet. The LLM contribution is commentary phrasing and formatting. If you let the LLM "calculate" numbers, you build in exactly the hallucination risk that finance cannot absorb.

3. Sanity-checking forecast assumptions

The third use-case is subtler but valuable. Forecasts in mid-market companies usually emerge from sales estimates plus experience plus gut feel. An AI system can sanity-check the assumptions against external and internal signals: prior-year seasonality, generic industry indicators, the pipeline from CRM, public macro data. The AI does not say what the forecast should be. It surfaces where an assumption deviates strongly from prior-year patterns or market signals, and asks: is that on purpose?

In workshops, this often ends up being the use-case with the highest aha-moment. Not because the AI has the truth, but because it surfaces assumptions that never get discussed in the manual forecast process. The CFO gets a better discussion basis, not a "correct" number.

4. Anomaly detection in bookkeeping, receivables, and inventory

Anomaly detection is what auditors have had in their tools for years (data-analysis tools such as IDEA or ACL). With current LLM-based systems, that capability becomes accessible during ongoing operations: unusual posting patterns, duplicate supplier invoices, deviating payment terms, inventory levels that suddenly diverge from seasonality, receivables with inexplicably extended DSO. The agent flags outliers, the controller investigates.

Mandatory rule: no auto-actions. The agent reports, does not dunning, does not auto-reverse postings. A flagged duplicate invoice gets reviewed by a human, not auto-canceled. But the visibility comes earlier, and this is often the workshop use-case that has saved concrete Euro amounts within one to two months (avoided double payments, earlier dunning, better inventory disposition).

5. Contract and invoice review

Incoming supplier invoices, master agreements, service contracts: a typical Mittelstand company processes hundreds to thousands per year. AI-assisted review can match invoice against purchase order (quantity, price, terms), find critical clauses in new contracts (liability exclusions, penalties, termination notice), and write deadlines into a tracker. The bookkeeper or CFO gets a pre-checked list with annotations, rather than going through each document manually.

For mid-market companies with sensitive contracts: data stays in a GDPR-compliant system. Claude and ChatGPT Business/Enterprise as well as Gemini Enterprise have the right setup, but with different defaults. Claude does not train on API or enterprise inputs by default. ChatGPT Business and Enterprise have an opt-in training toggle that is off by default. Gemini for Workspace and Gemini Enterprise (Business/Standard/Plus) also do not use inputs for training by default; Gemini Enterprise Starter and the consumer Gemini app have different defaults, so verify the tenant edition. Verify the settings per tenant rather than trusting defaults blindly.

6. Liquidity forecast and cashflow scenarios

Liquidity is the discipline that keeps mid-market CFOs awake at night. An AI system can support weekly and monthly forecasts here: expected receipts from open items plus statistical DSO modeling, planned outflows, working-capital scenarios (what if the largest customer pays 14 days later). Again, the math itself runs deterministically in the liquidity tool or sheet. The AI contribution is scenario description, assumption checks, and identification of working-capital risks.

Workshop experience: this is the second wave, not the first. It needs clean open-item data and a stable DSO model. Starting it without that data hygiene builds in false security. Building it after use-cases 1 and 2 are stable gives a very strong lever for the CFO's communication toward banks and shareholders.

7. Investor and bank communication

The seventh use-case is closer to stakeholders: preparing reporting packs for banks, shareholders, and (for PE-owned mid-market companies) the owner, drafting Q&A lists for banking meetings, drafting commentary for the annual report. LLMs work well here because they turn existing numbers into narrative drafts and structure answers to typical banker questions ("how are you responding to the input-cost situation?"). The CFO edits, adds tonality and strategic context, signs off.

The key: no auto-send. External communication carries the highest reputational risk, and that is exactly where humans belong in the loop. Use-case 7 is not full autonomy, it is acceleration of preparation.

The line: automating versus approving

From these seven use-cases, a clean rule emerges that applies to every CFO mandate involving AI: not every step is fair game for the agent alone. A pragmatic three-tier split has worked well in the workshops.

Tier 1, purely aggregating, autonomous. Fetch data, merge, sanity-check, format, flag. The agent runs alone here, provided the aggregation is deterministic and the agent only triggers and verifies. Use-case 1 in pure form, parts of use-case 4 (flagging, not action).

Tier 2, judgmental with human in the loop. Sanity-check forecast assumptions, comment on reports, prioritize anomalies, run liquidity scenarios. The agent delivers a recommendation with reasoning, the human decides whether to accept, adjust, or reject. With audit trail: what did the agent propose, what was approved, what was changed. Use-cases 2, 3, 5, 6 typically live here.

Tier 3, external communication, always human. Reports to banks, investor letters, annual-report text, communication with auditors. The agent prepares, the human owns publication. Audit trail mandatory. Use-case 7 lives entirely on this tier.

Drawing those three tiers cleanly avoids the two most common failures. First: trusting the agent too little and confining it to tier 2 in tasks that are really tier 1 (wasted speed). Second: trusting the agent too much and automating tier-3 tasks (wasted trust, sometimes wasted job).

3 anti-patterns that destroy trust in 2026

The list of patterns to avoid matters as much as the use-cases. Three of them keep showing up in the workshops.

LLM calculates instead of deterministic aggregation. The most common failure. A controller dumps a data pack into ChatGPT and asks "what is the margin?". The model replies with a plausible-looking number that is occasionally 5 to 15 percent off, because the model rounds, misreads, or hallucinates arithmetic. LLMs are not calculators. Aggregations must happen in code, SQL, or Excel formula. The LLM agent orchestrates the call and checks the result, it does not compute itself. Internalizing that single rule makes finance pipelines dramatically more robust.

Auto-sending investor or bank reports without CFO sign-off. The temptation is obvious: the monthly investor update could just go out automatically once the report is done. The damage when a wrong number, a misread phrasing, or an unaligned tone slips through outweighs the saved effort. External finance communication must go through the CFO or a clearly authorized delegate. The agent prepares, the human clicks send.

Customer or finance data in free tools. Some staff, for convenience, upload supplier invoices or receivables lists into a free ChatGPT account, a free Claude web interface, or a browser plugin. That is a GDPR problem (no data processing agreement), a data-protection problem (sensitive business data outside your control), and an audit finding waiting to happen. The answer is not "ban AI" but "set up a proper enterprise tenant and offer staff a better path". You cannot run use-case 5 or 7 in production without a clean tool tenant first.

What is realistic in a 200-1500-employee mid-market company

Sorting the roughly 40 workshop cases by maturity gives a clear order.

Production-ready right now: use-case 1 (data aggregation), use-case 2 (report preparation), use-case 4 (anomaly detection). All three pull time out of the controlling team immediately, allow clean risk management (tier 1 or tier 2), and are typically deliverable in eight to twelve weeks once a decent data connection exists.

Need data hygiene first: use-case 3 (forecast assumptions), use-case 5 (contract review), use-case 6 (liquidity forecast). These are high-value but usually fail not on the AI side but on data quality in the source systems. Building use-case 6 on a half-maintained open-items dataset gives false confidence. Hygiene first, then AI.

Pilot-suitable, not full autonomy: use-case 7 (investor and bank communication). The leverage is there, but the risk is the highest. Starting is fine, as a preparation tool for the CFO, not as an autonomous output channel.

Where a CFO starts in 2026

If only one recommendation survives, it is this: start with report preparation and anomaly detection. These two use-cases have the highest leverage-to-risk ratio. Both are productive in two to three months, both relieve the controlling team visibly, both have a clear human-in-the-loop step baked in.

Important for the first weeks: do not try to build the mega-cockpit. A single, well-chosen point in the monthly routine where AI removes 60 percent of the preparation time is worth more than a full-stack project that aims for go-live in nine months. Pilot for three months, then next use-case. The second wave (liquidity, contract review) then benefits from the data investments that the first wave needed anyway.

A briefing with your auditor at the start is highly recommended. Not as an approval step but as early clarification: how will the auditor evaluate the aggregation logic, which audit trails do they expect, how do you document changes. That saves expensive rework later.

Where the EU AI Act touches the finance use-case

The EU AI Act is a mandatory topic in 2026, but it is not for finance what it is for HR. As of today (May 2026), finance applications are generally not high-risk. Standard obligations apply (transparency, labeling for external communication where relevant, documented processing).

There is one exception every CFO should keep in mind: credit scoring and creditworthiness assessment of natural persons can be high-risk. Annex III number 5b of the AI Act addresses exactly that point. If your mid-market company automates creditworthiness assessment against natural persons (so not only B2B customers but private individuals or sole traders) through an AI system, check before the cutoff on 02.08.2026 whether you fall into the Annex III scope. If yes: risk-management system, data governance, technical documentation, oversight, all mandatory.

For the classic mid-market CFO with B2B customers and no automated creditworthiness decisions, the picture is more relaxed, but the check itself is mandatory. The fine ranges where relevant: 35 million Euro or 7 percent of annual turnover applies only to prohibited practices under Article 5. High-risk violations sit at 15 million or 3 percent, false information to authorities at 7.5 million or 1 percent. No high-risk application means standard obligations. High-risk application means full obligations from 02.08.2026.

Simple rule of thumb for 2026: HR AI is routinely high-risk, finance AI is not blanket high-risk, credit scoring and creditworthiness assessment are the two finance topics that deserve a closer look.

FAQ

Does the AI hallucinate on numbers? If you let it calculate, yes. LLMs are not calculators. Aggregations must run deterministically (code, SQL, Excel formula). The AI agent triggers the aggregation and checks the result, it does not execute it itself. With that split, hallucination risk is practically eliminated because no number originates in the LLM.

Do I need a new ERP integration? Usually not, or at least not right away. The first use-cases (aggregation, report preparation, anomaly detection) often work with existing exports and interfaces. A cleaner integration is worth investing in once the pilot has shown that the leverage is real and that data delivery is genuinely the bottleneck.

What will my auditor say? Auditors are usually not against AI in controlling, they are against poorly documented changes. Documenting changes to aggregation logic, AI-assisted steps, and human approvals cleanly (audit trail, versioning, roles) tends to produce a constructive stance. Bringing the auditor in early is cheaper than retrofitting later.

What does this cost realistically? Enterprise LLM licenses in 2026 typically sit in the single- to low-double-digit Euro range per user per month. The real cost is not the license, it is the setup: data interfaces, aggregation logic, audit trail, process adjustment. Realistic mid-market ranges are between 30,000 and 150,000 Euro for the first two productive use-cases including data hygiene, deliverable in two to four months. The TCO over twelve months is clearly positive against saved controller time, provided you do not try to build everything at once.

Which tools are suitable for finance use-cases today? There is no single answer, but the setup pattern is stable: an enterprise LLM with a clear data-protection configuration (Claude or ChatGPT Business/Enterprise or Gemini Enterprise) plus a deterministic aggregation layer (Excel/Python/SQL) plus an orchestration layer (a simple scripting tool or an agent platform). Vendors offering "finance-specific AI tools" exist in 2026, but as of today the most robust path is the pattern above with clear responsibilities.

Sources

Bitkom AI Study 2025
McKinsey State of AI Report (November 2025)
Gartner Press Release 06/2025
MIT NANDA Report 2025
Sentient Dynamics workshop aggregate (40 DACH mid-market workshops 2025/2026)

Next step

If you want to pilot report preparation or anomaly detection in your controlling function, let us talk for 30 minutes. We will look together at which of the seven use-cases is closest to production-ready in your setup, and what the first pilot looks like without risking the quarterly close or the auditor conversation.

Book a demo