From AI Pilot to AI Program: the Scaling Leap for the Mittelstand 2026
The first AI pilot is the easy exercise. The leap to the second, third, tenth use case is where most Mittelstand companies stall in 2026. Six building blocks for the jump to a real AI program.
The first AI pilot is the easy exercise. The leap to the second, third, tenth use case is where most DACH Mittelstand companies stall in 2026. Not because the technology does not scale, but because the organization does not scale. Here are the six building blocks that actually make the leap.
Why a pilot is not a program
A pilot is one use case, one team, one sponsor, one ad-hoc chosen tool, one budget out of the strategy pot. It works. It even works well when the sponsor is loud enough and the use case is scoped tightly enough. That describes most AI showpiece projects in 2025.
A program is something different. It is not the same pilot times ten. It is N parallel use cases, a shared platform, a governance layer, a budget logic, a clear capability model. The jump from pilot to program is a phase change, not a scaling factor.
The problem: in 2026, many Mittelstand companies are sitting exactly at this threshold. The first pilot is running, a second is being prepared, a third is being discussed in the board meeting. And then what Gartner warned about in its June 2025 press release happens: more than 40 percent of agentic-AI projects will be canceled by the end of 2027, often for cost and value reasons, not because the models do not work. The MIT NANDA Report 2025 tells the same story from the other side: 95 percent of GenAI pilots show no measurable P&L impact. Neither is a model problem. Both are an organization problem.
Anyone who understands this treats the leap from pilot to program for what it is: a structural matter.
Practical litmus test: if the question "who decides which use case starts next" cannot be answered in one sentence in your company, you are still in the pilot phase. If the answer is "the AI owner with the backlog and the quarterly review", you are in the program. The leap is between those two states. And right between those two states, most DACH Mittelstand companies in 2026 lose tempo, money, and trust.
The six building blocks for the leap
1. AI owner role
A named person for the AI program, not "the board takes care of it". The AI owner is the person responsible for scaling AI in the company: backlog, platform decisions, escalations, reporting to the executive board. Typical profiles in the Mittelstand: ex-strategy, ex-digitalization, ex-IT lead. Rarely a pure tech person, almost never an external hire. This is a reskilling path from the existing workforce, not additional headcount.
A concrete Mittelstand example: a 400-employee mechanical engineering company turns its former head of digitalization into the AI owner at 60 percent of her time. She does not sit in IT, she sits next to the executive board, with a mandate across all business functions. What she is not: the sole decision-maker for every use case. What she is: the person who resolves prioritization conflicts, sets platform standards, and reports the AI program to the board and advisory council. Anyone making the owner into the "AI doer" overloads the role. Anyone making the owner into the "AI orchestrator" has understood the leap.
2. Use-case backlog
A visible, prioritized list instead of scattered PowerPoints in email inboxes. Each use case rated by effort, impact, and risk. The backlog replaces the usual "whoever has the loudest idea" logic with traceable ranking.
In the Mittelstand, three columns are enough: expected effort (small/medium/large), expected impact (small/medium/large), risk (compliance, vendor, data). This is not academic. It is the foundation for the next slot going to the use case with the best ratio, not to the loudest department head. Anyone who wants more documents the assumptions next to it. Anyone without a backlog runs AI by acclamation.
Practical note: the backlog does not belong in a PowerPoint that gets updated every three months. It belongs in a tool every stakeholder can see (Notion, Linear, Jira, Confluence, whatever you already run). Visibility is half the discipline. A backlog only the AI owner maintains is not a backlog, it is a private note with roadmap pretensions.
3. Platform, not tool zoo
A shared technical foundation for all use cases: LLM access (Claude, ChatGPT Enterprise, or Gemini Enterprise, usually two of them), RAG components, identity integration (SSO), logging, cost monitoring. So not every use case builds its own stack.
The difference in daily operations: without a platform, use case A has its own vector store, use case B uses a different RAG setup, use case C runs on a ChatGPT workspace nobody has full visibility into. With a platform, these four or five components are built once, secured once, observable once. Tool vendor note in passing: in the B2B editions (Claude API and Claude for Work, ChatGPT Business and Enterprise, Gemini for Workspace and Gemini Enterprise in the Business, Standard and Plus tiers) training on customer data is off by default. Gemini Enterprise Starter and the consumer editions of all three vendors have different defaults, that one you check during vendor selection. That makes the platform choice calmer than it is often presented.
A warning from practice: "platform" does not mean "we build everything ourselves". Platform means a deliberate decision about what is central (LLM access, SSO, logging, eval tooling) and what stays use-case specific (domain prompts, data integration, UI). Anyone answering the platform question with "we build everything ourselves" creates a second shadow IT. Anyone answering it with "we buy everything off the shelf" creates vendor lock-in. The right answer lies between the two extremes and is decided with the architect (see building block 6), not by the executive board.
4. Eval governance
How do you measure across use cases what is working? Without eval governance, every project has its own test, or none, or the intern's test. With eval governance there is a shared test-set principle, a template for eval sets per use case, drift monitoring, and a rule book for critical use cases (HR, finance, customer-facing).
The typical Mittelstand mistake: eval gets reduced to "accuracy". Reality: eval covers correctness, completeness, hallucination rate, latency, cost per request, data-protection violations. Weighted differently per use case. That is exactly why you need governance, not a perfect Excel sheet.
A note on test sets: they are the most valuable asset an AI program builds. Six months of use-case operation produce a test set that makes a model switch or vendor switch decidable in the first place. Anyone who does not maintain test sets is, at the next model update, dependent on the gut feeling of the business unit. That does not scale, and it is not audit-ready.
5. Budget logic
Separate budgets for pilot ramp and production operations. Pilot budgets are allowed to be small and experimental, with high risk tolerance and many expected failures. Production budgets are run-rate matters, coming from business-unit budgets or a central AI budget, plannable, with clear TCO assumptions.
Anyone paying both from one pot gets two effects. First, the production-grade requirements reflex slows down every pilot. Second, every pilot that escalates suddenly costs more than planned because run-rate costs were underestimated. The separation is a discipline question, not a structure question. A Mittelstand company with 600 employees typically works with a pilot pot around 80 to 120 thousand euros per year and a separate run-rate budget that comes from business units.
6. Capability building
10 to 15 power users per 200 employees as a rule of thumb for Mittelstand companies in active rollout. Plus three roles in the middle: AI architect (platform decisions, security architecture), AI operator (operations, eval runs, monitoring), governance lead (rule book, AI-Act obligations, internal audits). This is consistent with the skills shift we described in Post 56.
Important: all three roles are reskilling paths. The architect comes from senior engineering or solution architecture, the operator from DevOps or application operations, the governance lead from compliance, data protection, or risk. Power users come from the business units that use the use case daily. Headcount expansion is not the goal here.
What capability building means in practice: four to six weeks of structured onboarding for the three middle roles, two to three days of power-user training per wave, then ongoing office hours and an internal forum. Without that routine, capability stays on the slide. Anyone booking capability only as a training budget without building ongoing mechanics is buying a certificate, not a capability.
What happens without these six building blocks
First, shadow IT with LLMs. Without a platform, every department buys its own ChatGPT workspace, every use case its own API keys, every business unit sends data to tools nobody has vetted. Bitkom's KI-Studie 2025 reported that 41 percent of companies with 20 or more employees use GenAI, rising to 89 percent at 500 or more employees. The question is no longer whether AI is in the building. The question is whether it is in the building under control.
Second, eval sprawl. Every project builds its own test, often only just before go-live, often without drift monitoring afterwards. Six months later, it is no longer clear whether use case A still works as well as on go-live day. Model updates, data shifts, new edge cases go unnoticed. McKinsey's State of AI from November 2025 names missing measurement discipline as one of the main drivers for the absence of P&L impact.
Third, vendor lock-in out of convenience. If the first pilot runs on provider X, the second one often runs there too because the integration work is already done. Three use cases later, the dependency is so deep that switching would be a six-month project. A platform decision up front would have kept a second or third model option open without every use case having to build it individually.
Fourth, pilot owner blocked. The person who invented the first use case often becomes the de facto key person for all further use cases. That is humanly understandable and structurally damaging. The pilot owner was right for the pilot. For the program, you need an owner with a mandate across all use cases, not a champion of a single topic. Without that separation, the first success becomes the second one's bottleneck.
Fifth, indirect but expensive: loss of board trust. Anyone who does not make the leap and instead lines up more pilot use cases next to each other ends up after 12 to 18 months with many half-finished initiatives, little run-rate effect, and results that are hard to communicate. The next pitch for AI budget then gets disproportionately harder. This is exactly where the Gartner cancel effect kicks in: not because the technology failed, but because the story is no longer tellable.
The honest timeline: 12 to 18 months for the leap
The Sentient Dynamics workshop aggregate from roughly 40 DACH Mittelstand workshops shows a consistent value: anyone who cleanly jumps from pilot to program needs 12 to 18 months. Not 6, not 24. 12 to 18.
The distribution in practice looks like this: months 1 to 3 are AI owner role plus backlog plus initial platform decisions. Months 4 to 9 are platform build-out in parallel with two or three use cases going into production. Months 10 to 18 are eval governance, capability building, budget logic consolidation. Only from month 18 onwards is the program mature enough that new use cases can be added without special approval.
Anyone who wants to be "AI-first" in six months is building theater. Anyone who plans for 12 to 18 months and pulls the six building blocks cleanly comes out of 2026 and 2027 as an organized adopter. Not as a frontrunner, that is not the goal. As someone who has AI in the building, without shadow IT, without eval sprawl, without vendor lock-in, with capability instead of hero stories.
And yes, that is slower than the board presentation would prefer. It is also more realistic. Anyone who reacts to the six-month pressure and folds building blocks pays 18 months later in cleanup costs and lost trust. The honest timeline pays off.
What is different in 2026 compared to 2024: the models are mature enough that the pilot phase actually gets shorter. What does not get shorter is the organizational leap. If anything, the opposite. The faster pilot results become available, the sooner the Mittelstand faces the question "and now what" and the more expensive it is not to have the six building blocks prepared in the background.
Where you start when the first pilot is running
Recommendation from the workshops: AI owner role and use-case backlog first. Lowest effort, highest effect on everything else. Both can be set up in four to six weeks, both clarify responsibility and prioritization. Once that is in place, the platform question gets concrete, then eval and budget, then capability.
Anyone starting with platform or eval is building infrastructure for use cases that have not yet been prioritized. That leads to platform decisions that get revised later, and to eval templates nobody uses. Owner and backlog first. The rest flows after.
If the second use case is already being discussed, that is exactly the moment. If it is already running without an owner and without a backlog, this is catch-up time, not an emergency. Nobody got the leap clean on the first iteration. The point is getting it clean by the third iteration.
One last note: the leap is a board topic, not an IT topic. Delegate it to IT and you get a clean but isolated solution. Delegate it to strategy and you get a slide without run-rate. The interface between the two, driven by a named AI owner with a board mandate, is where the program actually emerges.
FAQ
Do I need an "AI Center of Excellence"? Not in that packaging. In the Mittelstand the term often sounds inflated, raises board expectations, and ties up hiring budgets. What you need are the six building blocks. If someone wants to call that a center of excellence later, no problem. Up front, the term often leads to the wrong structures.
Who in the Mittelstand takes on the AI owner role? Typical profiles: former head of digitalization, former IT strategy, former strategy or business development. Rarely a pure engineering lead, rarely external. What matters is a mandate from the executive board and enough understanding of business functions to evaluate use cases. 50 to 70 percent time allocation is normal in the Mittelstand.
What does the program cost annually? Heavily dependent on size and maturity. A 400 to 700 employee Mittelstand company typically works with a pilot pot of 80 to 120 thousand euros, a platform investment of 150 to 300 thousand euros in the first year, run-rate budgets of 100 to 250 thousand euros depending on the number of use cases. Capability building is mostly reskilling, so training budget plus time, not classic hire cost.
When are you done with the leap? When new use cases enter the backlog without special approval, the platform carries them without custom setup, eval runs automatically, the budget comes from business-function logic, and the AI owner is no longer operationally involved in every individual project. Rule of thumb: after 12 to 18 months. Before that, it is build-up, not program.
Does this also help with EU-AI-Act obligations from 02.08.2026? Indirectly, yes. The governance building blocks (eval governance, AI owner, platform logging) are exactly the structures you need for risk classification, logging obligations, and audit readiness anyway. Anyone pulling the six building blocks already has half the AI-Act work done.
Related posts
- From use case to production: the first AI agent
- The AI pilot graveyard: why pilots never reach production
- 40 percent of agentic-AI projects fail by 2027
- AI skills team in the Mittelstand: the role shift
- AI strategy for the Mittelstand: 5-phase roadmap
- AI agent cost: TCO over 12 months
- From pilot to production: five architecture failures
- Agentic AI 2026: six developments for the Mittelstand
- AI board agenda 2026: 8 topics for the board meeting
- Human in the loop: AI agent autonomy in the Mittelstand
Sources
- Gartner Press Release, June 2025: "Over 40 percent of Agentic AI Projects Will Be Canceled by End of 2027".
- MIT NANDA Report 2025: 95 percent of GenAI pilots without measurable P&L impact.
- Bitkom, KI-Studie 2025: GenAI adoption 41 percent from 20 employees, 89 percent from 500 employees.
- McKinsey State of AI, November 2025.
- Sentient Dynamics workshop aggregate from roughly 40 DACH Mittelstand workshops 2024 to 2026.
Next step
If you are sitting on the threshold between pilot and program and want to walk through the six building blocks for your company concretely, book a demo. We go through owner role, backlog setup, and platform decision in 60 minutes, focused on your specific use-case list.
About the author
Sebastian Lang
Co-Founder · Business & Content Lead
Co-Founder von Sentient Dynamics. 15+ Jahre Business-Strategie (u.a. SAP), MBA. Schreibt über AI-Act-Compliance, ROI-Messung und wie Mittelstand-CTOs agentische KI tatsächlich einführen.