If you’ve followed distributed databases for any length of time, you’ve probably read a Jepsen analysis. If you’ve read one, you know the feeling: a database vendor claims linearizability, Kyle Kingsbury introduces some network partitions, and a few weeks later we all learn what the database actually does under failure.
That feeling is the reason we wrote 34 Jepsen tests for ArcadeDB. We wanted to know what we actually do under failure, before we ask anyone else to trust us.
Today we’re publishing the full test suite, the methodology, and the results.
First, the disclaimer. This is not an official Jepsen analysis. Jepsen LLC did not commission, run, review, or certify these tests. We wrote them in-house using the open-source Jepsen framework (the same framework Kyle uses for his official analyses), but the design, execution, and results are entirely ours. We’re publishing everything so the community can scrutinize the methodology, and we’d genuinely love a real analysis from Jepsen LLC one day. Hi Kyle, if you’re reading this, please tear it apart.
Summary
- Database under test: ArcadeDB on the
apache-ratisbranch, with high availability built on Apache Ratis (Raft consensus). - Cluster: 5 Debian nodes in Docker, controlled by a Jepsen 0.3.11 control node.
- Workloads (6):
bank,set,elle,register,register-follower,register-bookmark. - Faults (7 nemeses):
none,partition,kill,pause,clock,all,all+clock. - Total runs: 34 (20 leader workloads + 14 follower workloads).
- Result: 34 / 34 PASS. Zero linearizability violations, zero lost writes, zero ACID anomalies.
- Source code: github.com/ArcadeData/arcadedb-jepsen (Apache 2.0).
- Caveat: This is in-house testing, not a Jepsen LLC certification. Independent review welcome.
What is Jepsen?
Jepsen is the gold-standard open-source framework for testing distributed systems. Created by Kyle Kingsbury (better known as aphyr), it became famous through the Call Me Maybe blog series, which methodically dismantled the consistency claims of databases like MongoDB, Redis, Cassandra, ElasticSearch, and many others.
What makes Jepsen special isn’t just the fault injection (network partitions via iptables, process kills with SIGKILL, GC-style pauses with SIGSTOP/SIGCONT, clock skew via date -s). It’s the checkers:
- Knossos: a linearizability checker that takes the history of operations and tries to find a serial ordering consistent with each client’s observed responses. If no such ordering exists, your “linearizable” register isn’t.
- Elle: a black-box transaction-isolation checker that builds a dependency graph from the transaction history and looks for cycles. Cycles map to specific anomalies: G0 (dirty write), G1a (aborted read), G1b (intermediate read), G1c (circular information flow), G2 (anti-dependency cycle), and lost updates.
You can’t bluff your way past either of them. They either find a counterexample, or they certify the history.
What we tested
The tests run against the ArcadeDB apache-ratis branch, where high availability is implemented on top of Apache Ratis (the production-grade Raft library that also powers Apache Ozone). The cluster is 5 Debian nodes in Docker, plus a control node running Leiningen and Jepsen 0.3.11. Each test gets a fresh cluster to eliminate cross-test contamination.
Six workloads
| Workload | What it checks | Checker |
|---|---|---|
| bank | ACID balance conservation across 5 accounts during concurrent transfers | Custom conservation invariant |
| set | No acknowledged write is ever lost during replication | Custom set checker |
| elle | Transaction isolation: G0, G1a, G1b, G2, lost updates | Elle |
| register | Linearizability of single-key read/write/CAS, leader reads | Knossos |
| register-follower | Linearizability when reads are routed to a non-leader (ReadIndex path) | Knossos |
| register-bookmark | Read-your-writes via commit-index bookmarks on follower reads | Knossos |
Seven nemeses
| Nemesis | Description |
|---|---|
none |
Baseline, no faults |
partition |
Random network partitions via iptables |
kill |
SIGKILL random nodes (crash) |
pause |
SIGSTOP/SIGCONT random nodes (long GC pause) |
clock |
Random ±60s clock shifts via date -s |
all |
partition + kill + pause concurrently |
all+clock |
all + clock skew |
The leader workloads run against 5 nemeses (we omit clock and all+clock because leader-only reads aren’t sensitive to follower clock drift). The follower workloads run the full 7. That’s 20 + 14 = 34 tests.
The Results
Behind every green check is a 90-second run (30 seconds for the most expensive Knossos workloads) of concurrent client operations against the cluster while the chosen nemesis hammers the nodes. Then the checker takes the recorded history and either says :valid? true or hands you a counterexample.
The Faults, Visually
The interesting Jepsen tests aren’t the none baseline. They’re what happens when the cluster is being actively misbehaved. Here’s what we throw at the 5-node cluster.
all and all+clock apply them concurrently.What Each Workload Actually Proves
Passing 34 tests sounds nice in a header, but each workload is asking a specific question. Here’s what we’re actually claiming.
bank: ACID under partitions
Five accounts, 1000 each, total 5000. Concurrent clients transfer random amounts between random pairs of accounts inside multi-statement transactions. After every operation the checker sums the balances. The total must always equal 5000. If a transfer is partially applied (debit succeeds, credit fails, or vice versa), the sum drifts and the test fails. Under partitions, kills, pauses, and the combined all nemesis: conservation holds.
set: no acknowledged write is lost
Insert unique integers, periodically read them all back. Every integer for which the server returned a successful write must appear in subsequent reads. This is the cleanest test for replication completeness: it doesn’t matter how the cluster reorders things, only that nothing acknowledged is silently dropped. Zero lost writes across all five nemeses.
elle: real transaction isolation, checked by cycles
This is where we throw multi-key read/write transactions at the cluster and let Elle build the dependency graph. Elle then looks for cycles that correspond to specific anomalies: G0 (dirty write), G1a (read of an aborted write), G1b (read of an intermediate value), G2 (anti-dependency cycle), and lost updates. We exclude G1c because, in our HTTP-based harness, reads after commit happen as separate calls; that creates a test-implementation pattern that Elle correctly flags as a “circular information flow” but which doesn’t reflect a real isolation violation. Every other anomaly class: none observed.
register: leader-side linearizability
A single integer, hammered with concurrent reads, writes, and compare-and-swap operations, all routed to the Raft leader. Knossos then attempts to find a serial ordering of those operations consistent with each client’s observed responses. Knossos is brutal: it’ll happily spend minutes searching, and if your “linearizable” register isn’t, it’ll tell you exactly which interleaving breaks. All four executed nemeses certified linearizable.
register-follower: linearizability when reads go to a follower
Writes still go to the leader, but reads are deliberately routed to a non-leader with the X-ArcadeDB-Read-Consistency: LINEARIZABLE header. This exercises the ReadIndex path on followers (RaftHAServer.ensureLinearizableFollowerRead()): the follower issues sendReadOnly() to the leader, the leader confirms it still holds quorum and returns its current commit index, the follower waits for its local state machine to catch up, then serves the read. Without that round-trip, a lagging follower would serve stale data and Knossos would catch it instantly. With it: linearizable across all 7 nemeses, including clock skew and all+clock.
register-bookmark: read-your-writes via commit-index bookmarks
Same follower-read setup, but instead of a full ReadIndex round-trip on every read, the client captures X-ArcadeDB-Commit-Index from each write response and echoes it back as X-ArcadeDB-Read-After on subsequent reads. The follower waits for its local apply to reach that index before serving. This is cheaper than ReadIndex but only guarantees read-your-writes for the issuing client, not global linearizability across clients. All 7 nemeses pass.
The two follower modes matter because most real applications don’t need global linearizability, they need their own writes to be visible to their own subsequent reads. The bookmark path gives that property at much lower cost than ReadIndex.
How read consistency works in ArcadeDB
The follower-read tests are the most novel piece, and they map directly to a configurable knob in the database:
| Level | Performance | Consistency | Use case |
|---|---|---|---|
eventual |
Fastest | May read stale data on followers | Analytics, dashboards |
read_your_writes (default) |
Fast | Leader reads from local DB; followers wait for client’s last write | Most OLTP workloads |
linearizable |
+1 RTT when lease expired | Full linearizability even under process pauses | Financial transactions, coordination |
You set it globally via arcadedb.ha.readConsistency or per request via the X-ArcadeDB-Read-Consistency HTTP header. The Jepsen runs use linearizable for the follower workloads (the most demanding setting) and the default read_your_writes for the leader workloads.
In linearizable mode, the leader checks its Raft lease before every read via Ratis’s sendReadOnly() API (Section 6.4 of the Raft paper). When the lease is valid (the common case), this is a local timestamp check with no network round-trip. When the lease has expired (e.g., after a long VM suspend or extreme GC pause), Ratis sends heartbeats to a majority before serving the read. About 1 extra RTT in the worst case, which is exactly the cost you’d expect for a correctness guarantee under arbitrary process pauses.
Reproduce it yourself
The full test suite is open source and Apache 2.0 licensed:
The repository includes the Docker setup, all six workloads, the nemesis implementations, and the run-all-tests.sh script that reproduces the entire 34-test sweep on your own hardware. A full sweep takes about 60 minutes on a modern laptop.
git clone https://github.com/ArcadeData/arcadedb-jepsen
cd arcadedb-jepsen
./build-local.sh /path/to/your/arcadedb
cd docker && docker compose up -d
docker exec jepsen-control sh /jepsen/docker/setup-ssh.sh
./run-all-tests.sh 90
Inspect the recorded histories, the Knossos and Elle outputs, the timeline plots: everything Jepsen produces is in store/ after each run.
What we did not test
Honest disclosure matters more than the green checkmarks, so here’s what these 34 tests do not cover:
- Long-duration runs. Each nemesis combination ran on the order of minutes, not hours. Slow-burn anomalies (memory leaks, file-handle exhaustion, Raft log compaction edge cases that only surface after millions of entries) are out of scope.
- Disk corruption, fsync lying, and Byzantine faults. We assume the kernel honors
fsync()and that nodes are non-malicious. We do not inject bit-flips, truncate WAL files, or simulate filesystems that ack writes without persisting. - Geo-replication scenarios. All five nodes live in the same Docker network with single-digit-millisecond latencies. We have not tested cross-region links, asymmetric latency, or sustained high jitter.
- Compounded worst-case for follower reads. We exercised expired Raft lease, clock skew, and partitions individually (and clock + partition + kill + pause together via
all+clock), but we did not run the specific stack of expired lease + clock skew + active partition simultaneously against the linearizable follower-read path.
Some of these (longer runs, Byzantine fsync, geo-replication) are on the roadmap. Others (true Byzantine resilience) are explicitly out of scope for a CFT (crash-fault-tolerant) Raft system. If you think any of these should be in the next pass, open an issue or send a PR.
Help us break it
We’re publishing this for two reasons.
One: we want the upcoming Ratis-based HA release to be the most thoroughly tested HA stack ArcadeDB has ever shipped. Internal tests pass; that’s the floor, not the ceiling.
Two: we’d love independent scrutiny. We’re open to PRs that add workloads, tighter checkers, more aggressive nemeses, or just better failure modes we haven’t thought of. If you find a real linearizability violation, a lost write, or an isolation anomaly, please open an issue. And Kyle, if you ever want to run a real Jepsen analysis on ArcadeDB, our doors are wide open. We’d love to read it. Even if (especially if) it turns up things our in-house tests missed.
Until then: 34 tests in, 34 tests passed, every line of the framework and every line of the test suite open for your inspection.
Further reading
- ArcadeDB Client-Server architecture and HA cluster
- ArcadeDB use cases: graph, document, key-value, search, vector, time-series in one engine
- Neo4j alternatives in 2026
- GraphBatch: up to 8x faster graph ingestion
- Apache Ratis - the Raft library powering ArcadeDB HA
- Raft consensus paper (Ongaro & Ousterhout)