← Publications · 2026-06-19
Xybern
Xybern Research
2026-06-19
You Cannot Audit What You Did Not Authorise

When a regulator examines a financial firm after an incident, they do not open with "what happened." They already know what happened. They ask a different question, and it is the one that decides whether the firm is liable: what controls were in place to prevent it, and can you prove they were operating.

This distinction is the whole of compliance. An audit is not a search for what occurred. It is a demand for evidence that the right controls existed, applied to the events in question, at the time they happened. A log that records what an actor did is necessary but nowhere near sufficient, because it answers the question the regulator did not ask while leaving the one they did ask unanswered.

AI agents have made this gap acute. Enterprises are deploying agents that take thousands of consequential actions, capturing detailed logs of those actions, and mistaking those logs for an audit trail. They are not the same thing, and the difference is not academic. It is the difference between demonstrating governance and admitting you had none.

This piece is about why you cannot audit what you did not authorise, what separates a log from an evidentiary record, and what an actual audit trail for agent actions requires.


Two Questions That Look the Same

Start with the two questions, because almost everyone conflates them, and the conflation is the root error.

The first question is observational. What did the agent do. This is answered by logging. You record each action the agent took, with a timestamp, and you can replay the sequence later. Most agent platforms do this reasonably well. You can see that the agent called the payments API at a certain time with certain parameters.

The second question is evidentiary. Was the agent permitted to do that, by what rule, evaluated against what context, and who is accountable for the decision to allow it. This is answered by authorisation, and only if the authorisation decision was made and recorded at the moment of the action.

These look similar enough that teams assume the first answers the second. It does not. Knowing that the agent issued a refund tells you nothing about whether it was allowed to, what policy governed the decision, what context was considered, or whether a human approved it. The log captured the event. It did not capture the judgment, because in most deployments no judgment was made. The action simply executed, and the log dutifully recorded that it did.

Observational log Evidentiary record
Answers What did the agent do Was it authorised, on what basis, by whom
Produced by Logging the action Evaluating policy at the moment of action
Exists when After the action, always Only if a decision was made and recorded
Tells a regulator The event occurred The control operated
Can be reconstructed later Yes, from logs No, the decision moment has passed

The last row is the one that matters most, and it is the one teams discover too late.


You Cannot Reconstruct a Decision That Was Never Made

Here is the trap. Teams assume that if something goes wrong, they can reconstruct the authorisation story after the fact from the logs. They cannot, and the reason is fundamental rather than a tooling shortfall.

An authorisation decision is a function of the context that existed at the moment the action was attempted. What policy was active then. What the agent had already done in that session. What the agent's trust level was. Where the triggering instruction originated. Whether a human was asked and what they said. This context is ephemeral. It exists at the instant of the action and then it is gone, overwritten by the next action and the next session.

If you did not evaluate and record the authorisation decision at that instant, the inputs to it no longer exist in recoverable form later. You can see from the log that the agent transferred funds. You cannot reconstruct whether, at that moment, the policy permitted it, because policy changes, sessions end, and the contextual state that would have informed the decision was never captured. You are left inferring intent and permission from an action record that contains neither.

What actually happened (no authorisation layer):

   action attempted ──► executed ──► logged
                        (no decision made; context evaluated by nothing)

   later: auditor asks "was this authorised?"
                       you have the action. you do not have the decision,
   because there was no decision, and the context that
   would have informed one is gone.

This is why "we have logs" is not an answer to "can you prove this was governed." The logs prove the action happened. They are evidence of the event and silence on the control. A regulator reads that silence correctly: if you cannot show the decision, there was no decision, and an action taken with no authorisation decision is an ungoverned action regardless of how thoroughly you logged it.

The decision has to be made and recorded at the moment of the action, or it does not exist. There is no retroactive path to it.


What an Evidentiary Record Contains

If a log of the action is not enough, what does a real audit record contain. The answer follows directly from the question the regulator asks. Each action that matters produces a record with the elements that demonstrate a control operated.

The intended action. Not just what executed, but what the agent intended to do at the point of interception, before the layer allowed or blocked it. The intent is part of the evidence, because a blocked action is as important to the record as an allowed one. It shows the control catching something.

The policy that applied. Which specific rule governed this action. A record that says "refund above threshold, approval required" ties the action to the named control, which is exactly what an auditor is verifying existed.

The context evaluated. The signals the decision was made against: session history, provenance of the triggering instruction, the agent's identity and trust level, the time. This shows the decision was made on real grounds, not arbitrarily.

The verdict. Allowed, blocked, or escalated. The decision itself, recorded as a decision, not inferred from whether the action later appears in an execution log.

The human, if one was involved. When an action escalated to a person, who reviewed it and what they decided. This is the accountability anchor. It attaches a named human to the decision, which is what turns a machine action into something an organisation can stand behind.

A tamper evident signature. The record is signed so that it can be shown not to have been altered after the fact. An audit record that could have been edited is not evidence. The integrity of the record is part of what makes it admissible as proof of control.

Element What it proves to an auditor
Intended action What the agent tried to do, including blocked attempts
Policy applied The named control existed and governed this action
Context evaluated The decision was made on real grounds
Verdict A decision was actually made, not inferred
Human reviewer A named person is accountable for the call
Tamper-evident signature The record was not altered after the fact

A log contains the first column's events. An evidentiary record contains the second column's proof. The gap between them is the gap between observability and governance, and it is the gap a regulator is built to find.


Why Agent Logs Specifically Fall Short

It is worth being precise about why standard agent logging does not produce this record, because the shortfall is structural, not a matter of logging more fields.

A typical agent execution log records tool calls. The agent called this function with these arguments at this time and got this result. This is genuinely useful for debugging and observability. But notice what it structurally cannot contain.

It cannot contain the policy that applied, because in a logging only architecture no policy was evaluated. There is nothing to record. It cannot contain the verdict, because no verdict was reached; the action just ran. It cannot contain the human decision, because there was no interception point at which to involve a human. It cannot contain provenance reasoning, because the log records the call, not the causal chain that led the agent to make it. And it cannot be meaningfully tamper evident as proof of control, because it is a record of execution, not a record of authorisation.

The log is downstream of the action. The evidentiary record is produced at the moment of the decision, upstream of the action, by a layer that actually made a decision. You cannot generate the second by enriching the first, because the information in the second was never created. Adding fields to an execution log does not conjure a decision that no component ever made.

This is the precise sense in which you cannot audit what you did not authorise. The audit record is a byproduct of the authorisation decision. No decision, no record. No record, no audit. The logging was never the missing piece. The decision was.


A Worked Audit

Make it concrete with the situation every regulated enterprise eventually faces: an incident, followed by an examination.

An agent at a financial firm processed a transaction that should not have gone through. Perhaps it was manipulated, perhaps a policy edge case, the cause is not the point. Three months later the regulator arrives and asks the firm to demonstrate that its agent controls were operating at the time.

Trace the examination for a firm with logging only. The firm produces its execution logs. They show the agent called the transaction API at the relevant time with the relevant parameters. The regulator asks: what control evaluated this transaction before it executed. The firm has no answer, because none did. The regulator asks: what policy applied, and can you show it was active and enforced at that moment. The firm can produce a policy document, but nothing tying it to the action, nothing showing it was evaluated, nothing showing it was anything more than a document. The regulator asks: who authorised this. No one did. The transaction executed on standing permission with no decision point. The firm logged a breach in perfect detail and can prove only that it happened.

Now trace the examination for a firm with an authorisation layer. The firm produces the evidentiary record for the transaction. It shows the intended action, the specific policy that was evaluated, the context including the provenance of the trigger, the verdict, and, because this transaction crossed a threshold, the named human who reviewed and approved it, with a tamper evident signature on the whole record. The regulator can see exactly what control operated, on what basis, and who is accountable. If the approval was a mistake, that is a different and far more defensible position: a control existed and operated, and a named human made a judgment, which is what due diligence looks like. The firm is demonstrating governance, not admitting its absence.

Examination question Logging only Authorisation layer
What did the agent do Answered in detail Answered in detail
What control evaluated it None existed The recorded policy decision
What policy applied, and was it enforced A document, not tied to the action The named rule, evaluated and recorded
Who is accountable No one decided The named human reviewer
Can you prove the control operated No Yes, with a signed record

The agent did the same wrong thing in both columns. The difference is that one firm can prove it had governance and one cannot, and in a regulatory examination that difference is the entire outcome.


Detective Controls Are Not Preventive Controls

There is a deeper reason logging fails the audit, and it is worth naming in the language regulators themselves use. Controls come in two kinds, and auditors care about the difference.

A detective control tells you that something happened after it happened. A preventive control stops the wrong thing from happening in the first place. Logging is a detective control. It is, in fact, a particularly weak one, because it does not even detect in real time; it produces a record that someone must later read to detect anything at all. By the time the log is read, the action is months old.

Mature compliance frameworks do not treat detective and preventive controls as interchangeable. They expect preventive controls on the actions that matter, with detective controls as a complement, not a substitute. A firm that tells an auditor its only control over agent transactions is that it logs them is describing a detective control where a preventive one is required. The auditor's response is not satisfaction. It is a finding.

Detective control (logging) Preventive control (authorisation)
Acts After the action Before the action
Effect on a bad action Records it Stops it
Timing of detection Whenever someone reads the log At the moment of the action
Satisfies a "prevent" requirement No Yes
Produces evidence of control Weak, the event only Strong, the decision

The authorisation layer is a preventive control that produces a detective record as a byproduct. It does both jobs at once: it stops the action when policy says stop, and it records the decision either way. Logging does only the weaker half of one of those jobs. When a framework requires a preventive control, and the regulated frameworks increasingly do for consequential automated actions, a log is not a smaller version of the right answer. It is the wrong category of thing.

The Compliance Drivers Are Already Here

This is not a hypothetical concern waiting on future regulation. The accountability requirement is already written into the frameworks enterprises are subject to today, and they were written for exactly this distinction.

The EU AI Act requires record keeping and human oversight for high risk AI systems, framed around the ability to demonstrate that the system operated under control, not merely that it operated. Financial supervisors, from frameworks like SOX in the United States to SAMA in Saudi Arabia, require demonstrable controls over systems that move money, with the emphasis on demonstrable. The GDPR accountability principle requires that you can show how decisions affecting individuals were governed, not just that they were made. SOC 2 is built entirely around evidence that controls are designed and operating effectively, which is precisely a demand for evidentiary records rather than activity logs.

Every one of these frameworks asks the regulator's question, not the observer's. They ask you to prove the control operated. None of them is satisfied by a log that proves only that an action occurred. As agents take on more of the actions these frameworks govern, the gap between what enterprises can log and what they must prove widens, and it widens fastest in exactly the regulated industries where agents are most valuable.

   the widening gap

   agent action volume        ████████████████████  rising fast
   what logging can prove      ████                  the action happened
   what compliance requires    ████████████████████  the control operated
                               └────────────────────┘
                                the unproven middle is liability

The enterprises deploying agents into regulated workflows are accumulating a liability they cannot see, because the logs look comprehensive. The logs are comprehensive about the wrong thing. They are a complete record of events and a complete silence on controls, and the silence is what gets examined.


Accountability Across Agent Chains

The problem compounds in multi agent systems, and so does the value of getting the record right.

When an orchestrator delegates to sub agents, and one of those agents takes a consequential action, the accountability question becomes a chain question. Who initiated the task. Which agent decided on this action. What did each intermediate agent contribute. Was the action traceable to the operator's original intent or to something that entered through an untrusted path along the way.

A logging only architecture cannot answer this, because each agent logs its own calls in isolation and nothing records the chain. You have fragments from each agent and no thread connecting them into a single accountable story. When the regulator asks who is responsible for the final action, the honest answer is a shrug across several disconnected logs.

An authorisation layer that records decisions across the chain produces a connected account. Each action's record references the originating task, the delegation path that led to it, and the decision made at each step. The final action is traceable back through every hop to the human who initiated the task, with the authorisation verdict at each stage attached. The chain becomes a single accountable narrative rather than a pile of fragments, and accountability has a defined owner rather than dissolving across agents.

This is the difference between a system where responsibility is locatable and one where it evaporates the moment more than one agent is involved. Regulators do not accept evaporation. They assign responsibility to the firm, and a firm that cannot locate it internally absorbs all of it.


The Record Is a Byproduct of the Decision

The throughline of all of this is a single structural fact. The audit trail you need is not something you add alongside your agents. It is produced, automatically, as the byproduct of authorising each action.

When every action passes through a layer that evaluates it against policy, in context, and returns a decision before execution, that decision is the audit record. The intent, the policy, the context, the verdict, the human if there was one, signed and tamper evident, all of it exists because a decision was made and decisions leave records. You do not build the audit trail separately. It falls out of doing the authorisation correctly.

This is why the two problems, governance and audit, are actually one problem. An enterprise that authorises every agent action has, as a free consequence, a complete evidentiary trail of every one of those decisions. An enterprise that only logs has neither governance nor, despite appearances, an audit trail. It has a detailed account of ungoverned events.

You cannot audit what you did not authorise, because the audit record and the authorisation decision are the same artifact seen from two angles. Make the decision and you have the record. Skip the decision and no amount of logging will manufacture it, because the thing an auditor wants to see, a control operating at the moment of action, either happened or it did not. The log can describe the action forever. Only the decision can prove it was governed.


Xybern is the authorisation layer for enterprise AI agents. Every agent action is enforced, audited, and governed before it executes. Learn more at xybern.com or read the technical documentation at docs.xybern.com.

Share

Link copied!

Want more insights?

We publish regularly.

Stay updated with the latest research on verified AI reasoning.

More Publications Request a pilot