Your AI Tool Never Says "I'm Not Sure"

30 Jun

Written By Eszter Rapanos, Quality Assurance, Public Sector and Publication Manager, Chartered Institute for Business Accountants (CIBA)

This article will count 0.25 units (15 minutes) of unverifiable CPD. Remember to log these units under your membership profile.

A system that gets things right 98% of the time is also wrong 2% of the time. The problem is that it sounds just as confident either way. In a firm running hundreds of returns a year, that 2% is not a small number. It is a handful of real deliverables, sitting quietly among the rest, and the firm still has to own every one of them.

That is the core point made in a recent piece by Chris Farrell on AT Think, which argues that firms already know how to manage this kind of risk. They have been doing it for years with junior staff.

The Same Problem, A New Name

A first year staff member is not perfectly consistent either. They have good days and bad days. The profession never solved this by demanding perfect juniors. It solved it with layered review, clear sign off, and a partner who takes responsibility for the final output, no matter who did the work.

The article makes the case that this structure applies directly to AI. The difference is that a junior who is unsure will usually ask a question. AI does not. It does not hedge or flag uncertainty on its own. Firms have to build that signal into the system themselves.

Three Practical Shifts For Firms

The article sets out three ways firms can apply existing review habits to AI tools. Here is what each one looks like in practice.

Build in a confidence signal
An AI tool will not tell you when it is guessing, so you have to force that information out of it.
✅ Ask the tool to rate its own confidence on every output, on a simple scale such as high, medium, or low.
✅ Set a rule that anything below "high" goes to a staff member before it is used or sent to a client.
✅ Ask the same question twice, in two slightly different ways, and compare the two answers. If they do not match, treat that as a red flag and review manually.
✅ Keep a simple log of how often low confidence answers turn out to be wrong. Use this to decide if the threshold needs to move.
Set the rules before the work starts
Treat the AI tool the way you would treat a new staff member on day one. Do not hand it open ended work. Give it boundaries first.
📌 Write a short, plain language brief for each task you let AI touch. State exactly what it is allowed to do, and what it must always escalate to a person.
📌 Sort your tasks into two groups: those with a fixed, rule based answer, and those that need judgement. Let a simple rule based system or checklist handle the first group. Only use AI judgement on the second group, with review attached.
📌For any task involving numbers, build in an automatic check the output has to pass before it goes further. Examples include a reconciliation against another total, a range test that flags anything outside a normal limit, or a check that every figure has a supporting reference.
📌Document these rules somewhere your whole team can see them, not just in one person's head.
Create a memory the system does not have on its own

A person remembers a mistake after one correction. AI does not, unless you build that memory in by hand.

✅ Every time you catch an AI error, write it down as a short rule: what went wrong, and what should happen instead.

✅ Turn that rule into a test case. Before you trust the tool with similar work again, run the test case past it and confirm it now gets it right.

✅ Add a new check or gate to your process based on what you learned, so the same type of mistake cannot slip through unnoticed next time.

✅ Review this list of logged corrections every few months. If the same kind of error keeps coming up, it usually means the task needs a firmer rule, not another reminder.

Where To Start

Do not roll AI out across the whole practice at once. Start small and build up.

Pick one task. Choose something repetitive, with a clear right answer and a low cost if it goes wrong, such as sorting documents, drafting first pass reminders, or summarising routine correspondence.
Write the check before you start. Decide in advance what the output has to pass before a person signs off on it.
Name an owner. One person is responsible for that task's AI output, checks it on a set schedule, and answers for it if something goes wrong.
Keep a trail. Save a record of what the AI produced and what check it passed, so you can show your working if a client or SARS ever questions a number.
Prove it, then move on. Only add a second task once the first one has been running cleanly for a reasonable period. Each successful task builds the case for the next.

This connects to similar themes raised in CIBA's earlier coverage of AI risks accountants should understand, which points out that weak AI controls can lead to incorrect advice or regulatory exposure. It also echoes the governance expectations set out in King V's new guidance on AI, which calls for accountability and human oversight wherever AI is used.

The Takeaway

The signature at the bottom of a return is still yours, no matter what produced the work above it. If you are using AI in your practice, the safest move this week is to pick one repetitive task, decide what check that output must pass before it goes out, and write that check down. That single step turns a guess into a control.

Source article: Accounting Today

Artificial IntelligenceAIAccounting PracticePractice GrowthsEthicsQuality ControlAccountabilityVerificationInformationTrust

Eszter Rapanos, Quality Assurance, Public Sector and Publication Manager, Chartered Institute for Business Accountants (CIBA)

Your AI Tool Never Says "I'm Not Sure"

The Same Problem, A New Name

Three Practical Shifts For Firms

Build in a confidence signal

Set the rules before the work starts

Create a memory the system does not have on its own

Where To Start

The Takeaway

Rethinking AI in Accounting Education

The QuickBooks Trick Most Accountants Haven't Found Yet