30 days as an AI Mittelstand exec: what you actually do day by day, week by week

Four weeks, twelve concrete actions, a clear Go/No-Go on day 31. The operational onboarding plan for the DACH Mittelstand, without consulting-PowerPoint theater.

First AI adoption week in the DACH Mittelstand. What you actually do, not the consulting PowerPoint fairy tale. 30 days, 4 weeks, 12 concrete actions. On day 31 you either have a use case with first output, or you know on 6 pages why it does not work for you (also valuable).

This plan is for managing directors, strategy leads and division heads in the DACH Mittelstand who want to start without an external 80-day consulting plan. You do not need a steering committee, no 12-week discovery phase, no tool RFP. You need 4 weeks of discipline and one person who writes things down.

What 30 days realistically can do, and what they cannot

30 days are enough for a first productive use case with measurable output. 30 days are NOT enough for scaling, group-wide rollout or data-governance reform. Keep expectations clean, otherwise you will be disappointed on day 25 and abort.

Week	Days	Phase	Output by weekend
Week 1	Day 1-7	Discovery	Top-3 use cases, one owner per case
Week 2	Day 8-14	Tool setup	Platform live, data access clarified, 20 test cases
Week 3	Day 15-21	Pilot build	First prompt iterated, guardrails set
Week 4	Day 22-30	Test, decide	Go/No-Go verdict with numbers, 5 users tested

Do the math: 7+7+7+9 = 30 days. The last week is deliberately 9 days because tests and decisions need more buffer than discovery or setup. Anyone selling you a clean 7-7-7-7 plan has never run a real pilot-decide gate themselves.

Week 1, day 1-7: Discovery (what 80 percent of Mittelstand pilots skip)

Discovery is the underrated part. Most teams jump straight to "let us buy ChatGPT Enterprise", land three months later with 47 licenses without a use case, and wonder why nothing scales. Invest the 7 days. They are cheap.

Day 1-2: Use-case inventory

No workshop. One spreadsheet. One column for processes, one for monthly volume, one for current handling time, one for "does this work with text and data or does it need specialist knowledge". Target: 25 to 40 entries. In our workshops at SHD and comparable Mittelstand companies, a joint morning with 4 division heads routinely produces 30 candidates. You do not need an external survey.

What counts as a candidate: anything where a human today reads text, writes text, or copies between systems. Order intake by email. Classify supplier inquiries. Answer complaint emails. Check compliance documents. Match travel-expense receipts. Pre-screen recruiting CVs.

Day 3-4: Stakeholder match

Per top candidate you need three roles filled: one owner (operational, knows the process), one sponsor (budget and politics), one skeptic (will have to use this later, but has concerns). If you cannot find a skeptic for a use case, the use case is probably not real, but wishful thinking.

Skip this and you land in the pilot graveyard. We have written this up in detail, why pilots never reach production: the most common reason is not tech, but a missing production owner.

Day 5-7: Top-3 selection with criteria

From 25 to 40 candidates you will not be able to build all of them. Score by three hard criteria: monthly volume (higher is better, ROI scales with volume), ROI potential in hours per month, data availability (do we already have the data digitally somewhere). Multiply, do not add. A use case with high volume but no data is zero, not half as good. If you want help with the structured scoring, the 15-minute maturity check covers the selection grid.

By end of day 7: three use cases on paper, three owner-sponsor-skeptic triplets, three documented volume and ROI estimates. If you can only build one use case (resource constraint), build the one with highest volume.

Week 2, day 8-14: Tool setup

This week is the most boring one, but it decides whether you build anything at all in week 3. Keep it short. Do not plan a 6-week RFP, we are building a pilot, not an SAP replacement.

Day 8-9: Platform choice

Three realistic options for the first pilot: ChatGPT Enterprise (fast, low technical depth), Claude (comparable, often stronger on long documents), or in-house build on API basis (more effort, more control). For the first 30-day pilot we almost always recommend one of the first two options. In-house build comes after the first successful pilot, not before. We have discussed this extensively in Make, Buy or Partner for AI agents.

Important: enterprise contract with data processing agreement, otherwise you will not get to the productive data in week 3. A standard consumer account is fine for the pilot for playing around, not for real data.

Day 10-12: Data access

Three ways for the AI agent to reach your data. First, file upload, you copy documents manually (works for 50 cases, not for 5000). Second, connector to SharePoint, OneDrive, Confluence (most platforms have this ready). Third, RAG, you build a vector index over your documents (what RAG is and when you need it is in the 7-terms glossary).

Practical tip: start with connector or file upload. Most Mittelstand teams do not build RAG in week 2, it burns time. If your first pilot in week 4 shows that you actually need structured access to 10,000 documents, you build RAG in week 5 through 8.

Day 13-14: Eval-set setup

This is the point where the Mittelstand most often gets sloppy. You need 20 real test cases with expected output. Take 20 historical cases, write next to each what the correct answer would have been, that is your eval set. Without an eval set, in week 4 you will not know whether the pilot is good or bad, you only have a gut feeling. What an eval set is and why 20 cases are enough, we explain in the terms glossary.

End of week 2: platform live, data access works, 20 test cases with target answers sitting in the spreadsheet.

Week 3, day 15-21: Pilot build

Now you build. The temptation to stretch this week is large, resist it. If you cannot produce usable first output in 7 days, the use case or the data is wrong, not the schedule.

Day 15-17: First prompt

Three iterations are enough. Iteration 1: simple prompt, you explain the task, feed an example case, see what comes out. Iteration 2: you show the model two or three gold-standard examples (few-shot) and rephrase what you actually want. Iteration 3: you structure the output (JSON, fixed fields, clear sections) and define what happens on uncertainty. Anthropic documents the system well in Building Effective Agents, the core idea: start simple, add complexity only when the simple baseline prompt fails.

Day 18-19: RAG integration (only if needed)

If your pilot really needs structured document access, you now build a small vector index over 200 to 500 documents. If not (and in ~60 percent of Mittelstand pilots the answer is: not needed), you skip this block and gain two days of buffer. Do not go into RAG on principle. Build it only when the use case does not work without structured document access.

Day 20-21: Guardrails

Three basic rules are enough for the pilot. First: what the agent must not do (e.g., never send customer emails without approval, never propose price changes). Second: what happens on uncertainty (escalate to human, instead of guessing). Third: what is logged (every call, with input and output, for the eval). What guardrails are and why three rules are enough at the start is covered in the 7-terms glossary.

End of week 3: a working pilot that processes the 20 test cases from the eval set, with logging, with three guardrails.

Week 4, day 22-30: Test, decide

Now comes the part where the Mittelstand usually celebrates too early or gives up too quickly. Keep the tests structured.

Day 22-25: User tests with 5 real employees

Not the sponsor, not the owner. Five real operational employees who do the process daily today. They get 30 to 60 minutes, they run 5 real cases, they document each reaction ("that is wrong", "that is slow", "that is good", "that I do not understand"). Five testers are not statistically significant, we know that. They are practically significant though: if 4 out of 5 say "that is wrong", you do not need to test further.

Day 26-28: Output review against the eval set

Run the pilot over the 20 eval cases. Compare target output with actual output. Count cases in three buckets: fully correct, partially correct (acceptable with correction), wrong. If "fully correct" is below 50 percent, you have a data or prompt problem, not a platform problem. If "fully correct" is above 80 percent, you have a productive pilot.

Day 29-30: Go/No-Go with concrete criteria

Decision on day 30, documented on one page. Four criteria: user acceptance (yes, with reservation, no), eval score (percent fully correct), ROI estimate (hours saved per month, assumptions explicit), TCO estimate 12 months (license, ops, maintenance, the TCO post gives the structure). If three of four criteria are green, you move into the next 60 days of scaling. If two or fewer are green, you document on 2 pages why, and decide whether you drop the use case or invest the next 30 days in data or process groundwork.

The 5 typical traps in the 30 days

We have run this plan often enough to list the most common mistakes.

Trap 1: Workshop industry trap. Three days of workshop, a hundred post-its, zero actions. If the first 7 days do not end in a spreadsheet with top-3 cases, you have over-workshopped. Workshops are a tool, not an output.

Trap 2: Data-cleanup-first trap. The classic: "we have to consolidate all data first, then AI". That is the expensive variant of doing nothing. For a pilot you do not need all data clean, you need the 200 documents clean that the use case needs. We have debunked the data-cleanup-first myth elsewhere.

Trap 3: Too many pilots in parallel. Mittelstand division heads often want to start three pilots in parallel, "so we gather experience". In 30 days you can finish exactly one cleanly. Two end up both half-baked, three end up all three trash.

Trap 4: Production owner missing. You have defined sponsor, owner, skeptic, but not specified who takes operational responsibility on day 31. That is the real reason why pilots never reach production in the Mittelstand, see pilot graveyard analysis.

Trap 5: AI only, no human interface. An agent that decides fully automatically without an escalation point produces either trust problems or real damage in week 4. Plan human-in-the-loop from day 15 onward, not retroactively (the concept is laid out in the 7-terms glossary).

What day 31 decides, what day 91 decides

On day 31 you have an answer to "does this work technically and does the user accept it". That is not scaling, that is proof of concept. The next 60 days decide whether the pilot becomes a productive system, with monitoring, escalation, workforce training, integration with the real tool landscape. The 90-day use-case matrix is the natural continuation of this plan, the first 30 days are the hypothesis test, day 31 to 90 is the operationalization.

Workforce training, however, does NOT start on day 91. Training starts in parallel from week 2, because your user testers in week 4 need a basic understanding. How to build training pyramid-style without sending all 500 employees to a 3-day workshop, we documented in the workforce pyramid.

FAQ

Do we need external consulting for the 30 days? Not necessarily. You need one person who can reserve 50 percent of their time for 30 days (owner role), three division heads who put in 4 hours per week each (sponsor and stakeholder), and one technically capable employee for weeks 2 and 3 (platform setup, prompt iteration). If that is not in-house, external support pays off. If it is in-house, do it yourself.

What if we cannot decide after day 30? Then your eval set is too weak or your use case is too vaguely defined. Do not let yourself be told you need "another 30 days to mature", that is usually the pilot-graveyard pattern. Either you have data for a decision or you have to sharpen the test criteria, not stretch the timeframe.

Which use case is best for the first 30 days? Text-to-text use cases with high volume and low compliance risk. Examples: answering internal FAQ, classifying supplier inquiries, structuring complaint emails. Avoid at the start: use cases with direct customer interaction (too much pilot risk), use cases that need regulatory clearance (release cycles too long), use cases with complex tool integration (setup time too long).

When are 30 days too short? In heavily regulated domains (medical devices, financial services with audit duty, critical infrastructure) the release logic is longer than the pilot. There 60 or 90 days makes sense, but that is the exception. The typical Mittelstand company from manufacturing, trade, services gets a long way with 30 days. What Agentic AI actually is and where the limits sit is summarized in the executive crash course.

Sources and next step

Sources: Anthropic Building Effective Agents as the basis for prompt iteration and tool-use patterns. Own observations from Sentient Dynamics workshops at SHD (Mittelstand, 650 employees, Andernach) and comparable DACH Mittelstand companies, anecdotal and clearly labeled. Cross-references to our other posts as linked above.

We run a 30-day pilot sprint with your team. You start next Monday, on day 30 you have a Go/No-Go verdict with numbers. Book a session.