Trust · SLA and reliability

Xybern is inline. Here is what
99.9% uptime means when you are.

Because every AI action in your organisation passes through Xybern before it executes, our availability is your AI availability. We treat uptime as an operational obligation, not a marketing number. This page explains our availability architecture, our fail behaviour, and what happens to your AI operations during any Xybern incident.

Request pilot Security architecture
99.9%
Uptime SLA
0
Enforcement gaps ever
<50ms
P99 enforcement latency
<1s
Incident detection time

Section 01 · The inline availability problem

For a middleware system, uptime is a convenience. For an inline enforcement system, it is operational continuity.

If your email platform goes down, emails queue. If your analytics platform goes down, dashboards are unavailable. These are inconveniences. If Xybern goes down and you are using it as your mandatory AI execution pathway, the question is different: do your AI systems continue operating without enforcement? The answer to that question defines your risk exposure during any Xybern incident. We have designed the system so the answer is always no, AI systems do not operate without enforcement, even during incidents. Here is how.

Section 02 · Fail behaviour

Fail-closed by default. Configurable. Always explicit.

Xybern's default fail behaviour is closed, if the enforcement layer is unreachable, AI actions queue at the enforcement boundary and do not execute. This is a deliberate architectural decision, not a fallback. We explain it here, with the full consequence of each option, because you should choose your fail behaviour consciously.

Fail-open · not the Xybern default

Xybern unreachable → AI actions execute anyway

AI actions bypass the enforcement boundary
Enforcement skipped during the outage window
No record of what executed without verification
Regulatory exposure for the unrecorded period
Policy violations possible during the gap
Default

Fail-closed · Xybern default

Xybern unreachable → AI actions queue. None execute.

AI actions held at the enforcement boundary
Enforcement never bypassed under any circumstance
Every queued action processed on recovery in sequence
Full enforcement record maintained with no gaps
Zero regulatory exposure during the outage window

Default for all deployments. Configurable queue TTL, actions not processed within TTL are rejected with a permanent record.

Queue TTL default: 5 minutes · Configurable per deployment Fail-open opt-in requires: written acknowledgement + audit record of the decision

Section 03 · Uptime history

90 days. One incident. Zero enforcement gaps.

This is the operational record, not the SLA promise. Every bar represents one day. Green is fully operational. The single amber bar represents the one incident in the last 90 days, a 23-minute degraded performance window where enforcement continued operating at reduced throughput. No enforcement gaps occurred.

System uptime · Last 90 days · Updated daily
Dec 02 · Degraded performance · 23 min · Enforcement continued · 0 gaps
90 days ago Today
99.97%
Uptime last 90 days
1
Incidents last 90 days
23 min
Total incident duration
0
Enforcement gaps

Enforcement gaps, AI actions that executed without passing through Xybern, have been zero across all 90 days including the incident window. Fail-closed architecture maintained enforcement continuity throughout.

Section 04 · Incident response

How we respond when something goes wrong.

For an inline enforcement system, incident response is not just about restoring the service. It is about ensuring enforcement continuity throughout the incident and providing a complete record of what happened to every AI action during the window. This is how we handle it.

01

Detect

Automated detection — under 1 second.

Continuous health checks across all enforcement layer components. Anomaly detection fires within 1 second of any degradation. No manual monitoring required for initial detection.

02

Classify

Incident classified within 2 minutes.

Every incident is classified by impact level, Degraded (enforcement operating at reduced throughput), Partial outage (some regions affected), Full outage (enforcement boundary closed, fail-closed active). Classification determines the response path.

03

Contain

Fail-closed engaged if enforcement is at risk.

If incident classification indicates enforcement may be compromised, fail-closed is engaged automatically. AI actions queue. No enforcement gap opens. The queue TTL timer starts. This happens before any human is paged.

04

Resolve

Service restored. Queue processed in sequence.

On resolution, the enforcement queue is processed in the order actions arrived, oldest first. Every queued action receives a full enforcement evaluation. No action is skipped. The vault records the queued status and the enforcement timestamp for each.

05

Report

Post-incident report within 24 hours.

Every incident receives a written post-mortem within 24 hours, root cause, timeline, enforcement continuity record, and remediation. Enterprise customers receive the report directly. The enforcement continuity record shows every AI action that queued, every action that was processed on recovery, and confirms zero enforcement gaps.

SeverityResponse timeEnforcement behaviour
Degraded performance15 minutesEnforcement continues at reduced throughput
Partial outage5 minutesFail-closed engaged for affected regions
Full outage2 minutesFail-closed engaged globally, queue active
Security incidentImmediateAll enforcement suspended, security team engaged

Section 05 · Health monitoring API

Integrate Xybern health into your own monitoring stack.

Programmatic access to real-time system health via the health endpoint. Monitor enforcement layer status, uptime metrics, incident history and, most importantly, enforcement gap count from your own infrastructure. Integrate with PagerDuty, Datadog, Grafana or any monitoring tool that accepts a JSON health response.

GET /api/v1/health
GET /api/v1/health
Authorization: Bearer xb_prod_...

// Response
{
  "status": "operational",
  "timestamp": "2026-03-19T14:23:01Z",
  "services": {
    "enforcement_layer": "healthy",
    "verification_engine": "healthy",
    "provenance_vault": "healthy",
    "identity_layer": "healthy",
    "api_gateway": "healthy"
  },
  "uptime_30d": "99.97%",
  "uptime_90d": "99.94%",
  "last_incident": "2025-12-02T08:15:00Z",
  "response_time_p99": 43,
  "response_time_p50": 12,
  "enforcement_gaps_30d": 0,  // always zero — fail-closed
  "fail_behaviour": "closed",  // default — configurable
  "queue_depth": 0,
  "region": "eu-west-1"
}

enforcement_gaps_30d

Always zero if fail-closed is active.

fail_behaviour

Current fail mode. closed is the default. open means fail-open has been explicitly configured.

queue_depth

Actions queued at the enforcement boundary. Non-zero only during incidents.

response_time_p99

Worst-case enforcement latency in milliseconds. SLA commits to under 100ms P99.

Service-level health per component

Each Xybern subsystem reports health independently. A degraded verification_engine with a healthy provenance_vault means decisions are still being recorded even if throughput is reduced.

Region-scoped responses

The health response includes the region field, the infrastructure region this endpoint is reporting on. For multi-region deployments, query each region's endpoint independently.

Enterprise reliability.

Guaranteed.

99.9% uptime SLA, fail-closed architecture and zero enforcement gaps in production. SOC 2 certification is currently in progress. If you are deploying AI systems in a regulated environment, this is the reliability standard you need.

Request a pilot Security architecture