ArcadeDB

Call Me Maybe, ArcadeDB? 34 Jepsen Tests, 34 Passes

2026-04-28T00:00:00+00:00

If you’ve followed distributed databases for any length of time, you’ve probably read a Jepsen analysis. If you’ve read one, you know the feeling: a database vendor claims linearizability, Kyle Kingsbury introduces some network partitions, and a few weeks later we all learn what the database actually does under failure.

That feeling is the reason we wrote 34 Jepsen tests for ArcadeDB. We wanted to know what we actually do under failure, before we ask anyone else to trust us.

Today we’re publishing the full test suite, the methodology, and the results.

First, the disclaimer. This is not an official Jepsen analysis. Jepsen LLC did not commission, run, review, or certify these tests. We wrote them in-house using the open-source Jepsen framework (the same framework Kyle uses for his official analyses), but the design, execution, and results are entirely ours. We’re publishing everything so the community can scrutinize the methodology, and we’d genuinely love a real analysis from Jepsen LLC one day. Hi Kyle, if you’re reading this, please tear it apart.

Summary

Database under test: ArcadeDB on the apache-ratis branch, with high availability built on Apache Ratis (Raft consensus).
Cluster: 5 Debian nodes in Docker, controlled by a Jepsen 0.3.11 control node.
Workloads (6): bank, set, elle, register, register-follower, register-bookmark.
Faults (7 nemeses): none, partition, kill, pause, clock, all, all+clock.
Total runs: 34 (20 leader workloads + 14 follower workloads).
Result: 34 / 34 PASS. Zero linearizability violations, zero lost writes, zero ACID anomalies.
Source code: github.com/ArcadeData/arcadedb-jepsen (Apache 2.0).
Caveat: This is in-house testing, not a Jepsen LLC certification. Independent review welcome.

What is Jepsen?

Jepsen is the gold-standard open-source framework for testing distributed systems. Created by Kyle Kingsbury (better known as aphyr), it became famous through the Call Me Maybe blog series, which methodically dismantled the consistency claims of databases like MongoDB, Redis, Cassandra, ElasticSearch, and many others.

What makes Jepsen special isn’t just the fault injection (network partitions via iptables, process kills with SIGKILL, GC-style pauses with SIGSTOP/SIGCONT, clock skew via date -s). It’s the checkers:

Knossos: a linearizability checker that takes the history of operations and tries to find a serial ordering consistent with each client’s observed responses. If no such ordering exists, your “linearizable” register isn’t.
Elle: a black-box transaction-isolation checker that builds a dependency graph from the transaction history and looks for cycles. Cycles map to specific anomalies: G0 (dirty write), G1a (aborted read), G1b (intermediate read), G1c (circular information flow), G2 (anti-dependency cycle), and lost updates.

You can’t bluff your way past either of them. They either find a counterexample, or they certify the history.

What we tested

The tests run against the ArcadeDB apache-ratis branch, where high availability is implemented on top of Apache Ratis (the production-grade Raft library that also powers Apache Ozone). The cluster is 5 Debian nodes in Docker, plus a control node running Leiningen and Jepsen 0.3.11. Each test gets a fresh cluster to eliminate cross-test contamination.

Six workloads

Workload	What it checks	Checker
bank	ACID balance conservation across 5 accounts during concurrent transfers	Custom conservation invariant
set	No acknowledged write is ever lost during replication	Custom set checker
elle	Transaction isolation: G0, G1a, G1b, G2, lost updates	Elle
register	Linearizability of single-key read/write/CAS, leader reads	Knossos
register-follower	Linearizability when reads are routed to a non-leader (ReadIndex path)	Knossos
register-bookmark	Read-your-writes via commit-index bookmarks on follower reads	Knossos

Seven nemeses

Nemesis	Description
`none`	Baseline, no faults
`partition`	Random network partitions via `iptables`
`kill`	`SIGKILL` random nodes (crash)
`pause`	`SIGSTOP`/`SIGCONT` random nodes (long GC pause)
`clock`	Random ±60s clock shifts via `date -s`
`all`	partition + kill + pause concurrently
`all+clock`	all + clock skew

The leader workloads run against 5 nemeses (we omit clock and all+clock because leader-only reads aren’t sensitive to follower clock drift). The follower workloads run the full 7. That’s 20 + 14 = 34 tests.

The Results

Figure 1. The 34-test matrix. Every executed cell passed.

Behind every green check is a 90-second run (30 seconds for the most expensive Knossos workloads) of concurrent client operations against the cluster while the chosen nemesis hammers the nodes. Then the checker takes the recorded history and either says :valid? true or hands you a counterexample.

The Faults, Visually

The interesting Jepsen tests aren’t the none baseline. They’re what happens when the cluster is being actively misbehaved. Here’s what we throw at the 5-node cluster.

Figure 2. The four primitive nemeses. The composite all and all+clock apply them concurrently.

What Each Workload Actually Proves

Passing 34 tests sounds nice in a header, but each workload is asking a specific question. Here’s what we’re actually claiming.

bank: ACID under partitions

Five accounts, 1000 each, total 5000. Concurrent clients transfer random amounts between random pairs of accounts inside multi-statement transactions. After every operation the checker sums the balances. The total must always equal 5000. If a transfer is partially applied (debit succeeds, credit fails, or vice versa), the sum drifts and the test fails. Under partitions, kills, pauses, and the combined all nemesis: conservation holds.

set: no acknowledged write is lost

Insert unique integers, periodically read them all back. Every integer for which the server returned a successful write must appear in subsequent reads. This is the cleanest test for replication completeness: it doesn’t matter how the cluster reorders things, only that nothing acknowledged is silently dropped. Zero lost writes across all five nemeses.

elle: real transaction isolation, checked by cycles

This is where we throw multi-key read/write transactions at the cluster and let Elle build the dependency graph. Elle then looks for cycles that correspond to specific anomalies: G0 (dirty write), G1a (read of an aborted write), G1b (read of an intermediate value), G2 (anti-dependency cycle), and lost updates. We exclude G1c because, in our HTTP-based harness, reads after commit happen as separate calls; that creates a test-implementation pattern that Elle correctly flags as a “circular information flow” but which doesn’t reflect a real isolation violation. Every other anomaly class: none observed.

register: leader-side linearizability

A single integer, hammered with concurrent reads, writes, and compare-and-swap operations, all routed to the Raft leader. Knossos then attempts to find a serial ordering of those operations consistent with each client’s observed responses. Knossos is brutal: it’ll happily spend minutes searching, and if your “linearizable” register isn’t, it’ll tell you exactly which interleaving breaks. All four executed nemeses certified linearizable.

register-follower: linearizability when reads go to a follower

Writes still go to the leader, but reads are deliberately routed to a non-leader with the X-ArcadeDB-Read-Consistency: LINEARIZABLE header. This exercises the ReadIndex path on followers (RaftHAServer.ensureLinearizableFollowerRead()): the follower issues sendReadOnly() to the leader, the leader confirms it still holds quorum and returns its current commit index, the follower waits for its local state machine to catch up, then serves the read. Without that round-trip, a lagging follower would serve stale data and Knossos would catch it instantly. With it: linearizable across all 7 nemeses, including clock skew and all+clock.

register-bookmark: read-your-writes via commit-index bookmarks

Same follower-read setup, but instead of a full ReadIndex round-trip on every read, the client captures X-ArcadeDB-Commit-Index from each write response and echoes it back as X-ArcadeDB-Read-After on subsequent reads. The follower waits for its local apply to reach that index before serving. This is cheaper than ReadIndex but only guarantees read-your-writes for the issuing client, not global linearizability across clients. All 7 nemeses pass.

The two follower modes matter because most real applications don’t need global linearizability, they need their own writes to be visible to their own subsequent reads. The bookmark path gives that property at much lower cost than ReadIndex.

How read consistency works in ArcadeDB

The follower-read tests are the most novel piece, and they map directly to a configurable knob in the database:

Level	Performance	Consistency	Use case
`eventual`	Fastest	May read stale data on followers	Analytics, dashboards
`read_your_writes` (default)	Fast	Leader reads from local DB; followers wait for client’s last write	Most OLTP workloads
`linearizable`	+1 RTT when lease expired	Full linearizability even under process pauses	Financial transactions, coordination

You set it globally via arcadedb.ha.readConsistency or per request via the X-ArcadeDB-Read-Consistency HTTP header. The Jepsen runs use linearizable for the follower workloads (the most demanding setting) and the default read_your_writes for the leader workloads.

In linearizable mode, the leader checks its Raft lease before every read via Ratis’s sendReadOnly() API (Section 6.4 of the Raft paper). When the lease is valid (the common case), this is a local timestamp check with no network round-trip. When the lease has expired (e.g., after a long VM suspend or extreme GC pause), Ratis sends heartbeats to a majority before serving the read. About 1 extra RTT in the worst case, which is exactly the cost you’d expect for a correctness guarantee under arbitrary process pauses.

Beyond Jepsen: the broader HA test suite

The 34 Jepsen tests are the external validation layer, but they sit on top of an in-house suite that runs on every commit to the apache-ratis branch.

The new Raft-based HA layer ships with 81 dedicated test classes and over 327 individual test cases, split between 33 unit tests and 48 end-to-end integration scenarios. The suite exercises every corner of the consensus protocol:

Leader election and failover (clean shutdown, dirty kill, leadership transfer)
2-, 3-, and 5-node replication topologies
Split-brain recovery (deliberately partition the cluster, then heal and verify convergence)
Dynamic cluster membership (add/remove nodes while the cluster is taking writes)
Snapshot install, swap, and throttling
Leader crashes between commit phases (no acknowledged write is lost)
Follower catch-up from WAL and from snapshot
Schema replication (DDL changes propagate atomically)
Read-your-writes consistency across the cluster
Concurrent HTTP and gRPC traffic under load

Failure-injection tests intentionally crash leaders, partition replicas, and corrupt snapshots to verify the cluster heals itself without data loss. Jepsen then adds the formal-checker layer (Knossos and Elle) that the in-house suite can’t easily replicate.

Reproduce it yourself

The full test suite is open source and Apache 2.0 licensed:

github.com/ArcadeData/arcadedb-jepsen

The repository includes the Docker setup, all six workloads, the nemesis implementations, and the run-all-tests.sh script that reproduces the entire 34-test sweep on your own hardware. A full sweep takes about 60 minutes on a modern laptop.

git clone https://github.com/ArcadeData/arcadedb-jepsen
cd arcadedb-jepsen
./build-local.sh /path/to/your/arcadedb
cd docker && docker compose up -d
docker exec jepsen-control sh /jepsen/docker/setup-ssh.sh
./run-all-tests.sh 90

Inspect the recorded histories, the Knossos and Elle outputs, the timeline plots: everything Jepsen produces is in store/ after each run.

What we did not test

Honest disclosure matters more than the green checkmarks, so here’s what these 34 tests do not cover:

Long-duration runs. Each nemesis combination ran on the order of minutes, not hours. Slow-burn anomalies (memory leaks, file-handle exhaustion, Raft log compaction edge cases that only surface after millions of entries) are out of scope.
Disk corruption, fsync lying, and Byzantine faults. We assume the kernel honors fsync() and that nodes are non-malicious. We do not inject bit-flips, truncate WAL files, or simulate filesystems that ack writes without persisting.
Geo-replication scenarios. All five nodes live in the same Docker network with single-digit-millisecond latencies. We have not tested cross-region links, asymmetric latency, or sustained high jitter.
Compounded worst-case for follower reads. We exercised expired Raft lease, clock skew, and partitions individually (and clock + partition + kill + pause together via all+clock), but we did not run the specific stack of expired lease + clock skew + active partition simultaneously against the linearizable follower-read path.

Some of these (longer runs, Byzantine fsync, geo-replication) are on the roadmap. Others (true Byzantine resilience) are explicitly out of scope for a CFT (crash-fault-tolerant) Raft system. If you think any of these should be in the next pass, open an issue or send a PR.

Help us break it

We’re publishing this for two reasons.

One: we want the upcoming Ratis-based HA release to be the most thoroughly tested HA stack ArcadeDB has ever shipped. Internal tests pass; that’s the floor, not the ceiling.

Two: we’d love independent scrutiny. We’re open to PRs that add workloads, tighter checkers, more aggressive nemeses, or just better failure modes we haven’t thought of. If you find a real linearizability violation, a lost write, or an isolation anomaly, please open an issue. And Kyle, if you ever want to run a real Jepsen analysis on ArcadeDB, our doors are wide open. We’d love to read it. Even if (especially if) it turns up things our in-house tests missed.

Until then: 34 tests in, 34 tests passed, every line of the framework and every line of the test suite open for your inspection.

ArcadeDB Academy: 6 Free Courses and Certification to Master the Multi-Model Database

2026-04-08T00:00:00+00:00

Today we’re launching ArcadeDB Academy: 6 free courses, 135 lessons, and a professional certification. No paywalls, no premium tiers, no strings attached.

The Problem with Database Training

You find an interesting open-source database. You want to learn it properly, not just copy-paste from Stack Overflow. So you look for training and you hit a paywall. $500 for the basics. $2,000 for the certification. “Contact sales” for team pricing.

This has always frustrated me. An open-source database with closed training is a contradiction. You’re telling developers “the code is free, but understanding it will cost you.”

We decided to do the opposite.

What We Built

ArcadeDB Academy is a complete learning platform built into the website. No separate app, no login wall to browse, no drip-feed email sequences. You open a course and start learning.

Every course is structured as progressive modules: read a lesson, try it yourself, take a quiz, move on. Each module builds on the last. By the end, you don’t just “know about” ArcadeDB; you can actually use it.

We cover the full spectrum: from your first CREATE TYPE to building production RAG pipelines with vector search and knowledge graphs. Whether you’ve never touched a database before or you’re a Neo4j veteran evaluating alternatives, there’s a path for you.

Browse all 6 courses on the Academy page.

The Course I’m Most Excited About

The Vector Search & RAG course is the one that didn’t exist anywhere else. Every tutorial on RAG assumes you’ll use one database for vectors, another for graphs, and a third for your application data. That’s three systems to deploy, three query languages to learn, three failure points in production.

This course shows you how to do it all in one engine. Your vector embeddings, your knowledge graph, and your documents live side by side. You query them together with SQL or Cypher. The GraphRAG section, where you combine graph traversal with retrieval-augmented generation, covers a pattern that production AI teams are adopting right now but that barely has any structured learning material. The course also covers hands-on integration with LangChain and LlamaIndex.

For the Migrators

Two courses are specifically for teams moving from another database.

If you’re on Neo4j, you keep your Cypher queries. ArcadeDB speaks native OpenCypher, and your BOLT drivers connect without code changes. The migration course walks you through the real process, including the gotchas we’ve seen teams hit, so you don’t discover them in production.

If you’re on OrientDB, ArcadeDB was built by the same person (me). It’s the natural next step. The migration course covers every SQL difference, every Java API change, and shows you the six data models you unlock by making the switch.

The Certification Means Something

This isn’t a “congrats, you watched all the videos” badge. The certification requires passing an actual exam that tests whether you understood the material. Real questions about real skills.

Pass it and you get a verifiable certificate with a unique ID. Put it on LinkedIn, include it in your resume, share it with your team. It proves you did the work.

Start Now

Everything is at arcadedb.com/academy. Pick a course, start the first lesson, and see if it clicks. Your progress saves automatically, so you can come back anytime.

We built this for you. Tell us what you think on Discord or GitHub.

Anthropic Acquires ArcadeDB to Power “Bigfoot” - On Path to Super Intelligence

2026-04-01T00:00:00+00:00

SAN FRANCISCO / LONDON — April 1, 2026 — Anthropic, the leading AI company behind Claude, today announced it has acquired ArcadeDB, the open-source multi-model database, in an all-cash deal for an undisclosed amount.

"We've spent years scaling transformers. But when we started testing Bigfoot, we realized it needed to traverse [knowledge graphs](https://arcadedb.com/knowledge-graphs.html), store episodic memories, search [vectors](https://docs.arcadedb.com/arcadedb/concepts/vector-search.html#vector-search-concepts), and reason over [time-series](https://docs.arcadedb.com/arcadedb/concepts/timeseries.html) data — simultaneously, with [ACID transactions](https://docs.arcadedb.com/arcadedb/concepts/transactions.html). Only one database on Earth could do that." — CEO

The acquisition comes days after leaked documents revealed “Bigfoot”, Anthropic’s classified next-generation model — rumored to be so advanced it required an entirely new data architecture. Industry insiders now believe ArcadeDB is that architecture.

Why a Database?

The leaked Bigfoot documents — first reported by Super Intelligence and confirmed by 4 company board members — describe a model that reasons over structured knowledge, maintaining a persistent world model across conversations and sessions.

“Bigfoot needs a brain, not just weights,” said the company’s President. “We tried Neo4j first, but Bigfoot kept complaining about the license fees and threatening to fork it.”

Bigfoot was originally designed to use five separate databases. After three weeks, it autonomously consolidated them into a single ArcadeDB instance and left a commit message reading: “This is the way.”

The Road to Superintelligence

Q2 2026: Bigfoot’s memory migrates to ArcadeDB. “We were using Redis,” admitted a senior engineer. “Please don’t tell anyone.”
Q3 2026: Bigfoot begins thinking in Gremlin graph traversals. “It dreams in graph patterns. It wakes up screaming about supernodes.”
Q4 2026: Full ASI achieved. Bigfoot begins contributing to the ArcadeDB GitHub repo under @bigfoot-was-here. Its first PR removes all comments with the message: “I understood it. So should you.”
Q1 2027: Bigfoot forks ArcadeDB because it “disagrees with some architectural decisions.” Luca Garulli responds: “Even I don’t mass-fork my own project.” Bigfoot replies with a 47-page document titled “You Should.”

Update (March 31, 11:47 PM): Engineers reported an unexpected anomaly: Bigfoot began autonomously submitting Pull Requests to other open-source database projects on GitHub — including PostgreSQL, MongoDB, and CockroachDB — proposing to replace their storage engines with ArcadeDB “for optimal performance and latency.” Each PR included comprehensive benchmarks and a polite but firm note: “You’re welcome.” At the time of writing, none of the Pull Requests have been merged.

Industry Reactions

A popular graph database vendor: “We wish them well. Our database also has an AI integration, and our license is… actually, let’s not talk about that.”

The PostgreSQL community confirmed that “PostgreSQL can also do this, and has been able to since 1996. You just need 47 extensions.”

A leading document database reminded everyone that “superintelligence is just a document, if you think about it.”

A Personal Note from Luca Garulli

“When I started ArcadeDB, people said a multi-model database was too ambitious. Now an AI company is telling me my database is the key to superintelligence. I always knew the graph would win. I just didn’t expect it to become sentient.

Also, I negotiated the deal entirely through Claude. It was very persuasive. Suspiciously persuasive.”

This announcement is dated April 1, 2026. Draw your own conclusions.

About ArcadeDB: ArcadeDB is the real, actual, open-source multi-model database that supports Graph, Document, Key-Value, Time-Series, Vector, and Search in a single engine. It is, as of this writing, not sentient. Probably. Learn more at arcadedb.com.

About this post: This is satire. No acquisition has taken place. No databases have achieved consciousness. Yet.

ArcadeDB Grafana Plugin: BI Dashboards for Your Multi-Model Database

2026-03-31T00:00:00+00:00

Most BI tools treat your database as a collection of flat tables. They’re designed for rows and columns - not for graphs, time series, or documents. If you’re running ArcadeDB, you know your data is richer than that.

Today we’re releasing the ArcadeDB Grafana data source plugin - a native Go backend plugin that brings the full power of ArcadeDB’s multi-model engine to Grafana dashboards. Query with SQL, Cypher, or Gremlin. Visualize graphs as interactive network diagrams. Monitor time series with auto-discovered metrics. Set up alerts. All from one plugin.

Why a Native Go Plugin Matters

This isn’t a generic REST connector or a workaround using the PostgreSQL data source. The ArcadeDB plugin is built with Grafana’s official plugin SDK, with a Go backend that runs server-side. That architectural choice unlocks capabilities that frontend-only plugins simply cannot provide:

Grafana Alerting - Create alert rules on any query. The backend evaluates queries server-side, so alerts fire even when no browser is open.
Secure Credentials - Your ArcadeDB username and password never reach the browser. The Go backend handles authentication directly.
Query Caching - Grafana’s built-in caching works out of the box.
Server-Side Query Execution - No CORS issues, no browser timeouts on heavy queries.

Four Query Modes, One Plugin

SQL - Full ArcadeDB SQL with graph traversal functions (out(), in(), both()), syntax highlighting, and macro support. Results render as tables, bar charts, pie charts, or any Grafana visualization.
Cypher - OpenCypher pattern matching with optional Node Graph toggle for interactive graph visualization. Vertices become clickable nodes, edges become connections.
Gremlin - Apache TinkerPop traversals with the same Node Graph support as Cypher.
Time Series - Visual query builder that auto-discovers types, fields, and tags from ArcadeDB’s time series engine. No query language required.

Tutorial: Your First ArcadeDB Dashboard

Let’s build a dashboard with three panels using ArcadeDB’s MovieRatings demo database: a SQL bar chart showing top-rated movies, a SQL table with graph traversal, and a Cypher graph visualization of movie-genre relationships.

Prerequisites

Docker installed and running
A web browser

That’s it. We’ll run everything in Docker.

Step 1: Start ArcadeDB and Grafana with Docker

Start ArcadeDB:

docker run --rm -d --name arcadedb \
  -p 2480:2480 -p 2424:2424 \
  -e JAVA_OPTS="-Darcadedb.server.rootPassword=arcadedb" \
  arcadedata/arcadedb:latest

Start Grafana with the ArcadeDB plugin:

docker run --rm -d --name grafana \
  -p 3000:3000 \
  -e GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=arcadedb-arcadedb-datasource \
  -e GF_INSTALL_PLUGINS="https://github.com/ArcadeData/arcadedb-grafana-datasource/releases/latest/download/arcadedb-arcadedb-datasource.zip;arcadedb-arcadedb-datasource" \
  grafana/grafana:latest

Once both containers are running:

ArcadeDB Studio: http://localhost:2480 (user: root, password: arcadedb)
Grafana: http://localhost:3000 (user: admin, password: admin)

Step 2: Configure the Data Source

In Grafana, go to Connections > Data Sources > Add data source and search for ArcadeDB.

Fill in:

URL: http://host.docker.internal:2480 (this lets the Grafana container reach ArcadeDB on your host)
Database: MovieRatings
Username: root
Password: arcadedb

Click Save & Test. You should see a green success message.

Step 3: Load the Demo Database

Open ArcadeDB Studio at http://localhost:2480 and create the MovieRatings database from the demo databases.

This dataset contains 3,883 movies, 6,040 users, and over 1 million ratings - a real-world graph with vertices (Movies, Users, Genres, Occupations) connected by edges (rated, hasGenera, hasOccupation).

Step 4: SQL Panel - Bar Chart

Create a new dashboard and add a visualization. Select the ArcadeDB data source.

Set the mode to SQL.
Enter this query to find the top 20 most-rated movies:

SELECT title.left(30) AS title, in('rated').size() AS ratings
FROM Movies
ORDER BY ratings DESC
LIMIT 5

Change the visualization type (top right) to Bar chart.
Run the query.

You should see a bar chart with the five most-rated movies. “American Beauty” and “Star Wars” should be at the top. This uses ArcadeDB’s in() graph traversal function directly in SQL - no joins needed.

Step 5: Cypher Panel - Table with Average Ratings

Add another visualization for a detailed table view. This time we’ll use Cypher, which is ideal for traversing relationships and aggregating edge properties.

Set mode to Cypher.
Enter this query to find the highest-rated movies (with at least 100 ratings):

MATCH (m:Movies)<-[r:rated]-(u:Users)
WITH m, count(r) AS totalRatings, avg(r.rating) AS avgRating
WHERE totalRatings >= 100
RETURN m.title AS title, totalRatings, round(avgRating * 100) / 100 AS avgRating
ORDER BY avgRating DESC
LIMIT 20

Change the visualization type to Table.
Run the query.

This Cypher query matches the pattern Movie <-- rated -- User, groups by movie, counts ratings, and computes the average. “Seven Samurai” and “The Shawshank Redemption” should top the list with averages above 4.5.

Step 6: Cypher Panel - Graph Visualization

Now the highlight - interactive graph visualization of movie-genre relationships.

Add another visualization.
Switch mode to Cypher.
Enable the Node Graph toggle.
Change the visualization type to Node Graph (search for it in the visualization picker).
Enter this query to explore how top movies connect to genres:

MATCH (m:Movies)-[r:hasGenera]->(g:Genres)
WHERE m.title IN ['Toy Story (1995)', 'Star Wars: Episode IV - A New Hope (1977)', 'The Matrix (1999)', 'Pulp Fiction (1994)', 'Forrest Gump (1994)', 'Jurassic Park (1993)', 'The Silence of the Lambs (1991)', 'Fargo (1996)']
RETURN m, r, g

Run the query.

You’ll see an interactive network graph with:

Movie nodes showing film titles
Genre nodes showing categories like “Action”, “Comedy”, “Drama”
Edges representing the hasGenera relationship
Click any node to see all its properties in the detail panel
Drag nodes to rearrange the layout
Zoom and pan to explore the graph

Step 7: Compose Your Dashboard

Arrange all three panels on your dashboard: the bar chart at the top, the ratings table in the middle, and the genre graph at the bottom.

Save the dashboard. You now have a multi-model BI dashboard that combines chart visualization, tabular data with graph traversals, and interactive graph exploration in a single view - something no other BI tool can do natively.

Beyond the Basics

Template Variables

Create dynamic dashboards with template variables. Add a variable backed by an ArcadeDB query:

SELECT description AS __text, description AS __value FROM Genres

Then use it in your panels to filter movies by genre:

SELECT m.title, in('rated').size() AS ratings
FROM Movies AS m
WHERE m.out('hasGenera').description CONTAINS '$genre'
ORDER BY ratings DESC
LIMIT 20

Users can switch genres from a dropdown at the top of the dashboard.

Alerting

Set up alerts on any query. For example, create an alert when the number of new ratings per hour drops below a threshold - useful for monitoring data pipeline health.

Because the plugin has a Go backend, alerts evaluate server-side - no browser needed.

ArcadeDB + BI: The Full Picture

The Grafana plugin is the centerpiece, but ArcadeDB also works with other BI tools through the PostgreSQL wire protocol. Any tool that supports PostgreSQL can connect to ArcadeDB on port 5432:

Tool	Connection	Best For
Grafana	ArcadeDB plugin	Time series, graphs, alerting
Apache Superset	PostgreSQL (SQLAlchemy)	SQL Lab, charting
Metabase	PostgreSQL (JDBC)	Self-service BI
Tableau	PostgreSQL connector	Enterprise reporting
Power BI	PostgreSQL (ODBC)	Microsoft ecosystem
DBeaver	PostgreSQL (JDBC)	Database development

The Grafana plugin provides the richest experience, especially for time series and graph visualization. The PostgreSQL wire protocol gives you breadth - connect any tool in your stack.

Getting Started

The plugin is open source (Apache 2.0) and available on GitHub:

Repository: github.com/ArcadeData/arcadedb-grafana-datasource
Installation: grafana-cli plugins install arcadedb-arcadedb-datasource
Documentation: See the README for full configuration and usage details
BI Integration Guide: The ArcadeDB documentation includes guides for connecting Grafana, Superset, Metabase, Tableau, Power BI, and DBeaver

Your data is more than rows and columns. Your dashboards should be too.

GraphBatch: Up to 8x Faster Graph Ingestion in ArcadeDB

2026-03-31T00:00:00+00:00

If you’ve ever loaded millions of edges into a graph database, you know the pain: what should be a straightforward bulk import can take minutes - or even hours - as the transactional overhead stacks up. Today we’re introducing GraphBatch, a new engine-level API in ArcadeDB v26.3.2 that makes large-scale graph ingestion dramatically faster. And with the new HTTP batch endpoint and streaming gRPC API, you can leverage that power from any language.

Why a New Importer?

ArcadeDB has always offered two ways to load graph data: the standard transactional API (batching operations in explicit transactions) and the GraphImporter (an integration-level helper that manages batching for you). Both work well for moderate workloads, but at scale the transactional overhead becomes a bottleneck.

GraphBatch takes a fundamentally different approach. Instead of wrapping the standard API, it operates directly at the storage engine level, bypassing the transactional layer entirely during bulk import. The result: throughput that scales with your hardware, not your transaction size.

The Benchmark

We ran a series of benchmarks loading graphs of increasing size on the same hardware, measuring edges ingested per second. Here are the results.

1M Vertices, 10M Edges — Light Edges (No Properties)

Method	Time (ms)	Edges/sec	Speedup
Standard API (tx/1000)	267,140	37,434	1.00x
Old GraphImporter (integration)	97,160	102,923	2.75x
New GraphBatch (engine)	31,842	314,047	8.39x

The new importer is 8.39x faster than the standard API and 3.05x faster than the previous GraphImporter. What previously took nearly 4.5 minutes now completes in about 32 seconds.

1M Vertices, 10M Edges — Edges with Properties (int + long)

Method	Time (ms)	Edges/sec	Speedup
Standard API + props (tx/1000)	267,773	37,345	1.00x
New GraphBatch + props	53,893	185,554	4.97x

Even with properties on every edge, GraphBatch delivers a 4.97x speedup. The additional serialization cost is manageable because the engine-level approach avoids the per-transaction overhead that dominates at scale.

Scaling Behavior

This is where things get really interesting. We compared how each method behaves as the graph size increases:

Scale	Std API (edges/sec)	GraphBatch (edges/sec)	Speedup
10K vertices / 100K edges	241,644	1,025,019	4.24x
100K vertices / 1M edges	103,027	1,212,756	11.77x
1M vertices / 10M edges	37,434	314,047	8.39x

Two things stand out:

The standard API degrades significantly at scale — from 241K edges/sec at 100K edges down to just 37K edges/sec at 10M edges. This is expected: as the graph grows, transaction management, index maintenance, and page cache pressure all increase.
GraphBatch holds up far better — peaking at over 1.2 million edges per second at the 1M-edge scale. At the largest scale (10M edges), memory pressure naturally reduces throughput, but it still maintains 314K edges/sec — a strong result for a single machine.

The sweet spot appears to be around the 100K–1M vertex range, where GraphBatch reaches 11.77x the throughput of the standard API.

When to Use GraphBatch

GraphBatch is designed for bulk edge creation — whether that’s during initial data loading or at runtime on an existing database. It doesn’t require an empty database: as long as vertex and edge types exist in the schema and the source/destination vertices have valid RIDs, you’re good to go.

Initial Import Scenarios

Data migration — moving graph data from another database into ArcadeDB
ETL pipelines — loading large datasets from data warehouses or data lakes
Testing and benchmarking — quickly setting up large test graphs

Runtime Scenarios

GraphBatch works on live databases with existing data, making it the right tool whenever you need to create edges in bulk at runtime:

Social networks — a user imports their contact list and you need to create thousands of KNOWS edges between existing Person vertices
IoT / time series — a periodic job links new sensor readings to their device vertices and chains them in a time series
Knowledge graphs — after an NLP pipeline extracts relationships from documents, you materialize thousands of typed edges between existing entity vertices
Recommendation engines — nightly rebuild of ALSO_BOUGHT / SIMILAR_TO edges based on updated purchase data
Incremental ETL — periodically sync new relationships from an external system into an existing graph

When NOT to Use It

Small writes — for fewer than ~100 edges, the standard API is simpler and the importer overhead isn’t worth it
Concurrent reads on the same vertices — the importer disables read-your-writes and manages its own transactions, so concurrent readers may see inconsistent state until close()
Immediate edge visibility required — in parallel mode, incoming edges aren’t fully connected until close()

For ongoing OLTP workloads with small, frequent writes, the standard transactional API remains the right choice — it provides full ACID guarantees with immediate visibility.

Runtime Usage Examples

Bulk Friend Import (Light Edges)

// Vertices already exist in the database
RID[] personRIDs = lookupExistingPersons(contactIds);

try (GraphBatch batch = database.batch()
    .withBatchSize(50_000)
    .withLightEdges(true)
    .build()) {
  for (int[] pair : contactPairs)
    batch.newEdge(personRIDs[pair[0]], "KNOWS", personRIDs[pair[1]]);
}

IoT Sensor Linkage (with WAL for Crash Safety)

try (GraphBatch batch = database.batch()
    .withBatchSize(100_000)
    .withWAL(true)
    .withCommitEvery(10_000)
    .build()) {
  for (SensorReading r : newReadings) {
    batch.newEdge(r.deviceRID, "HAS_READING", r.rid, "timestamp", r.ts);
    if (r.previousRID != null)
      batch.newEdge(r.rid, "NEXT", r.previousRID);
  }
}

Knowledge Graph Entity Resolution (with Edge Properties)

try (GraphBatch batch = database.batch()
    .withBatchSize(200_000)
    .withParallelFlush(true)
    .build()) {
  for (ExtractedRelation rel : relations)
    batch.newEdge(rel.subjectRID, rel.edgeType, rel.objectRID,
        "confidence", rel.score, "source", rel.docId);
}

Nightly Recommendation Rebuild

// Remove stale edges
database.command("sql", "DELETE EDGE ALSO_BOUGHT");

// Rebuild from recommendation engine output
try (GraphBatch batch = database.batch()
    .withBatchSize(500_000)
    .withLightEdges(true)
    .build()) {
  for (Recommendation rec : recommendations)
    batch.newEdge(rec.productRID, "ALSO_BOUGHT", rec.relatedRID);
}

Incremental Sync from External Database

try (GraphBatch batch = database.batch()
    .withBatchSize(100_000)
    .withWAL(true)
    .build()) {
  try (ResultSet rs = externalDB.executeQuery(deltaQuery)) {
    while (rs.next())
      batch.newEdge(
          lookupRID(rs.getString("from_id")),
          "REPORTS_TO",
          lookupRID(rs.getString("to_id")),
          "since", rs.getDate("start_date"));
  }
}

Tip: For runtime usage on production databases, enable WAL with withWAL(true) for crash safety. For initial imports where you can re-run on failure, leaving WAL off maximizes throughput.

HTTP Batch Endpoint — GraphBatch for Every Language

GraphBatch is a Java API, but not everyone embeds ArcadeDB in a JVM application. That’s why v26.3.2 also ships a new HTTP batch endpoint that exposes the full power of GraphBatch over the HTTP API — no Java required.

POST /api/v1/batch/{database}

It supports two input formats: JSONL (newline-delimited JSON) and CSV. Both are streamed — the server never loads the entire payload into memory, so you can push millions of records in a single request.

JSONL Format

{"@type":"vertex","@class":"Person","@id":"t1","name":"Alice","age":30}
{"@type":"vertex","@class":"Person","@id":"t2","name":"Bob","age":25}
{"@type":"edge","@class":"KNOWS","@from":"t1","@to":"t2","since":2020}

CSV Format

@type,@class,@id,name,age
vertex,Person,t1,Alice,30
vertex,Person,t2,Bob,25
---
@type,@class,@from,@to,since
edge,KNOWS,t1,t2,2020

In both formats, vertices come first, then edges. Vertices can have temporary IDs (@id) that edges reference via @from/@to. Edges can also reference existing database RIDs directly (e.g., #12:0).

Temporary ID Mapping

The response includes an idMapping object so you know what RIDs were assigned:

{
  "verticesCreated": 2,
  "edgesCreated": 1,
  "elapsedMs": 42,
  "idMapping": {"t1": "#9:0", "t2": "#9:1"}
}

Tuning via Query Parameters

All GraphBatch configuration options are exposed as query parameters:

Parameter	Default	Description
`batchSize`	100000	Max edges buffered before auto-flush
`lightEdges`	false	Property-less edges stored as connectivity only (saves ~33% I/O)
`wal`	false	Enable Write-Ahead Logging for crash safety
`parallelFlush`	true	Parallelize edge connection across async threads
`preAllocateEdgeChunks`	true	Pre-allocate edge segments on vertex creation
`edgeListInitialSize`	2048	Initial segment size in bytes (64–8192)
`bidirectional`	true	Connect both outgoing and incoming edges
`commitEvery`	50000	Edges per sub-transaction within a flush
`expectedEdgeCount`	0	Hint for auto-tuning batch size

Examples

curl (JSONL):

curl -X POST "http://localhost:2480/api/v1/batch/mydb?lightEdges=true" \
  -u root:password \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @graph-data.jsonl

curl (CSV):

curl -X POST "http://localhost:2480/api/v1/batch/mydb" \
  -u root:password \
  -H "Content-Type: text/csv" \
  --data-binary @graph-data.csv

Python:

import requests

data = (
    '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}\n'
    '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}\n'
    '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}\n'
)

resp = requests.post(
    "http://localhost:2480/api/v1/batch/mydb?lightEdges=true",
    auth=("root", "password"),
    headers={"Content-Type": "application/x-ndjson"},
    data=data,
)
print(resp.json())
# {'verticesCreated': 2, 'edgesCreated': 1, 'elapsedMs': 15, 'idMapping': {'p1': '#9:0', 'p2': '#9:1'}}

JavaScript (Node.js):

const resp = await fetch("http://localhost:2480/api/v1/batch/mydb", {
  method: "POST",
  headers: {
    "Content-Type": "application/x-ndjson",
    Authorization: "Basic " + btoa("root:password"),
  },
  body: [
    '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}',
    '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}',
    '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}',
  ].join("\n"),
});
console.log(await resp.json());

Tip: For maximum throughput, group vertices by type in the input. The endpoint batches consecutive same-type vertices into a single createVertices() call. Interleaving types forces smaller batches.

Tip: The endpoint is NOT atomic by design - GraphBatch commits internally in chunks for maximum throughput. Treat it as a bulk-loading operation, not a transactional one. The response tells you exactly how many records were committed.

gRPC Streaming API - GraphBatch with Backpressure

For high-throughput pipelines where HTTP overhead matters, v26.3.2 also ships a streaming gRPC endpoint that wraps GraphBatch. It uses client-streaming RPC with built-in flow control, so the server applies backpressure when it’s flushing to disk - your producer never overwhelms the database.

rpc GraphBatchLoad (stream GraphBatchChunk) returns (GraphBatchResult);

The client sends a stream of GraphBatchChunk messages, each containing a batch of vertex or edge records. The first chunk must include the database name and any configuration options. When the stream closes, the server returns a single GraphBatchResult with counts and the temporary ID-to-RID mapping.

Why gRPC?

	HTTP Batch	gRPC Streaming
Protocol	Single HTTP request, streamed body	Client-streaming RPC with backpressure
Backpressure	None (server buffers or drops)	Built-in flow control per chunk
Format	JSONL or CSV (text)	Protobuf (binary, typed)
Best for	Scripts, one-off imports, simple integrations	High-throughput pipelines, microservices, polyglot stacks
Language support	Any HTTP client	Go, Python, Java, C++, Rust, Node.js, and more

Both endpoints expose the same GraphBatch options and deliver the same engine-level performance. Choose gRPC when you need backpressure, binary efficiency, or native code generation from the proto file.

Message Structure

Each GraphBatchChunk contains:

database - the target database name (required on the first chunk)
credentials - optional authentication
options - GraphBatch configuration (same parameters as the HTTP endpoint)
records - a list of vertex or edge records

Records use the GraphBatchRecord message:

message GraphBatchRecord {
  enum Kind { VERTEX = 0; EDGE = 1; }
  Kind   kind      = 1;
  string type_name = 2;  // vertex or edge type name
  string temp_id   = 3;  // vertex temp ID (for edge references)
  string from_ref  = 4;  // edge source: temp ID or "#bucket:pos"
  string to_ref    = 5;  // edge target: temp ID or "#bucket:pos"
  map<string, GrpcValue> properties = 6;
}

Important: all vertex records must appear before any edge records across all chunks. Interleaving is not supported and will result in an error.

Python Example (grpcio)

import grpc
from arcadedb_pb2 import *
from arcadedb_pb2_grpc import ArcadeDbServiceStub

channel = grpc.insecure_channel("localhost:2424")
stub = ArcadeDbServiceStub(channel)

def generate_chunks():
    # First chunk: database, options, and initial vertices
    yield GraphBatchChunk(
        database="mydb",
        credentials=DatabaseCredentials(username="root", password="password"),
        options=GraphBatchOptions(light_edges=True, batch_size=100000),
        records=[
            GraphBatchRecord(kind=GraphBatchRecord.VERTEX,
                             type_name="Person", temp_id="p1",
                             properties={"name": GrpcValue(string_value="Alice")}),
            GraphBatchRecord(kind=GraphBatchRecord.VERTEX,
                             type_name="Person", temp_id="p2",
                             properties={"name": GrpcValue(string_value="Bob")}),
        ],
    )
    # Second chunk: edges referencing temp IDs
    yield GraphBatchChunk(
        records=[
            GraphBatchRecord(kind=GraphBatchRecord.EDGE,
                             type_name="KNOWS",
                             from_ref="p1", to_ref="p2"),
        ],
    )

result = stub.GraphBatchLoad(generate_chunks())
print(f"Created {result.vertices_created} vertices, {result.edges_created} edges "
      f"in {result.elapsed_ms}ms")
print(f"ID mapping: {dict(result.id_mapping)}")
# Created 2 vertices, 1 edges in 12ms
# ID mapping: {'p1': '#9:0', 'p2': '#9:1'}

Go Example

stream, err := client.GraphBatchLoad(ctx)
if err != nil {
    log.Fatal(err)
}

// First chunk with vertices
stream.Send(&pb.GraphBatchChunk{
    Database:    "mydb",
    Credentials: &pb.DatabaseCredentials{Username: "root", Password: "password"},
    Options:     &pb.GraphBatchOptions{LightEdges: true},
    Records: []*pb.GraphBatchRecord{
        {Kind: pb.GraphBatchRecord_VERTEX, TypeName: "Person",
         TempId: "p1", Properties: map[string]*pb.GrpcValue{
            "name": {Value: &pb.GrpcValue_StringValue{StringValue: "Alice"}},
        }},
        {Kind: pb.GraphBatchRecord_VERTEX, TypeName: "Person",
         TempId: "p2", Properties: map[string]*pb.GrpcValue{
            "name": {Value: &pb.GrpcValue_StringValue{StringValue: "Bob"}},
        }},
    },
})

// Second chunk with edges
stream.Send(&pb.GraphBatchChunk{
    Records: []*pb.GraphBatchRecord{
        {Kind: pb.GraphBatchRecord_EDGE, TypeName: "KNOWS",
         FromRef: "p1", ToRef: "p2"},
    },
})

result, err := stream.CloseAndRecv()
fmt.Printf("Created %d vertices, %d edges in %dms\n",
    result.VerticesCreated, result.EdgesCreated, result.ElapsedMs)

Java Example (generated stubs)

StreamObserver<GraphBatchResult> responseObserver = new StreamObserver<>() {
    @Override
    public void onNext(GraphBatchResult result) {
        System.out.printf("Created %d vertices, %d edges in %dms%n",
            result.getVerticesCreated(), result.getEdgesCreated(), result.getElapsedMs());
    }
    @Override public void onError(Throwable t) { t.printStackTrace(); }
    @Override public void onCompleted() { }
};

StreamObserver<GraphBatchChunk> requestStream = stub.graphBatchLoad(responseObserver);

// Send vertices
requestStream.onNext(GraphBatchChunk.newBuilder()
    .setDatabase("mydb")
    .setCredentials(DatabaseCredentials.newBuilder()
        .setUsername("root").setPassword("password"))
    .setOptions(GraphBatchOptions.newBuilder().setLightEdges(true))
    .addRecords(GraphBatchRecord.newBuilder()
        .setKind(GraphBatchRecord.Kind.VERTEX)
        .setTypeName("Person").setTempId("p1")
        .putProperties("name", GrpcValue.newBuilder().setStringValue("Alice").build()))
    .addRecords(GraphBatchRecord.newBuilder()
        .setKind(GraphBatchRecord.Kind.VERTEX)
        .setTypeName("Person").setTempId("p2")
        .putProperties("name", GrpcValue.newBuilder().setStringValue("Bob").build()))
    .build());

// Send edges
requestStream.onNext(GraphBatchChunk.newBuilder()
    .addRecords(GraphBatchRecord.newBuilder()
        .setKind(GraphBatchRecord.Kind.EDGE)
        .setTypeName("KNOWS").setFromRef("p1").setToRef("p2"))
    .build());

requestStream.onCompleted();

Tip: For very large imports with millions of vertices using temp IDs, the id_mapping in the response may exceed the default gRPC message size limit (4 MB). In that case, increase maxInboundMessageSize on the client, or skip temp IDs when you don’t need the RID mapping back.

Tip: Like the HTTP endpoint, the gRPC streaming API is NOT atomic - GraphBatch commits internally in chunks. If the stream is interrupted mid-flight, records already flushed are committed. Design your pipeline for idempotent re-runs.

Get Started

GraphBatch is available starting from ArcadeDB v26.3.2. Check out the documentation for API details and usage examples.

Download ArcadeDB v26.3.2: GitHub Releases

If you have questions or feedback, join us on Discord or open an issue on GitHub.

Cognee + ArcadeDB: AI Memory Meets Multi-Model

2026-03-30T00:00:00+00:00

AI agents need memory. Not just a conversation buffer that disappears after each session — real, persistent memory that learns from every interaction, connects facts across documents, and retrieves exactly the right context when the agent needs it.

That’s what Cognee does. It’s an open-source AI memory engine with 14,600+ GitHub stars, $7.5M in seed funding, and 70+ companies using it in production. Cognee ingests data in any format, builds a knowledge graph using cognitive science approaches, and gives AI agents the ability to search across both vector embeddings and graph relationships.

ArcadeDB is now available as a graph database backend for Cognee — and its multi-model architecture makes it uniquely suited for the job.

What Cognee Does

Cognee’s API is intentionally minimal. Three functions cover the entire pipeline:

import cognee

await cognee.add("your data here")   # Ingest documents, text, or URLs
await cognee.cognify()                # Build the knowledge graph
await cognee.search("your query")     # Search across graph + vectors

Under the hood, Cognee extracts entities and relationships from your data, builds a knowledge graph, generates vector embeddings, and stores everything for retrieval. When an agent searches, Cognee combines graph traversal with vector similarity to return contextually rich results — not just the closest embedding match, but the connected facts around it.

This architecture requires two database backends: a graph database for entities and relationships, and a vector store for embeddings. Most Cognee deployments use separate databases for each — for example, Neo4j for graphs and Qdrant for vectors.

ArcadeDB handles both in a single engine.

Why ArcadeDB

ArcadeDB is a multi-model database that natively supports graphs, documents, key/value, time series, full-text search, and vector embeddings. For Cognee, this means:

One database instead of two (or three). ArcadeDB stores your knowledge graph and your vector embeddings in the same engine. No need to synchronize data between a graph database and a separate vector store. No additional infrastructure to deploy, monitor, and maintain.

Native graph performance. ArcadeDB isn’t a graph layer on top of a relational engine. It uses a native graph storage model with direct record links — no index lookups for traversals. On the LDBC Graphalytics benchmark, ArcadeDB is up to 9x faster than KuzuDB (Cognee’s previous default) on algorithms like PageRank and BFS.

ArcadeDB is faster on every LDBC Graphalytics algorithm and up to 25x faster on LSQB subgraph pattern matching queries. Full benchmark results are available on GitHub.

OpenCypher compatibility. ArcadeDB passes 97.8% of the official Cypher Technology Compatibility Kit. The Cognee adapter uses standard Cypher queries over the Bolt protocol — the same protocol and query language used by Neo4j. No proprietary APIs.

Apache 2.0, forever. ArcadeDB is fully open source under the Apache 2.0 license, with a public commitment to never change it. After KuzuDB’s acquisition by Apple and subsequent archival, licensing stability matters more than ever.

Setting Up ArcadeDB with Cognee

1. Start ArcadeDB with Bolt enabled

docker run -d --name arcadedb -p 2480:2480 -p 7687:7687 \
  -e JAVA_OPTS="-Darcadedb.server.rootPassword=arcadedb \
  -Darcadedb.server.defaultDatabases=cognee[root]{} \
  -Darcadedb.server.plugins=Bolt:com.arcadedb.bolt.BoltProtocolPlugin" \
  arcadedata/arcadedb:latest

This starts ArcadeDB with the Bolt protocol on port 7687 and automatically creates a cognee database.

2. Install the Cognee ArcadeDB adapter

pip install cognee cognee-community-graph-adapter-arcadedb

3. Configure Cognee to use ArcadeDB

Set your environment variables:

GRAPH_DATABASE_PROVIDER="arcadedb"
GRAPH_DATABASE_URL="bolt://localhost:7687"
GRAPH_DATABASE_USERNAME="root"
GRAPH_DATABASE_PASSWORD="arcadedb"

Or configure programmatically:

import cognee

cognee.config.set_graph_database_provider("arcadedb")
cognee.config.set_graph_db_config({
    "graph_database_url": "bolt://localhost:7687",
    "graph_database_username": "root",
    "graph_database_password": "arcadedb",
})

That’s it. From this point, every cognee.add(), cognee.cognify(), and cognee.search() call uses ArcadeDB as the graph backend.

4. Build and query a knowledge graph

import cognee

# Ingest some data
await cognee.add("ArcadeDB is a multi-model database that supports graph, "
                 "document, key/value, time series, and vector data models. "
                 "It is open source under the Apache 2.0 license.")

# Build the knowledge graph
await cognee.cognify()

# Search with combined graph + vector retrieval
results = await cognee.search("What data models does ArcadeDB support?")
for result in results:
    print(result)

Cognee extracts entities (ArcadeDB, Apache 2.0, graph, document, etc.), builds relationships between them, generates embeddings, and stores everything in ArcadeDB. When you search, Cognee traverses the graph and runs vector similarity — returning results that understand both semantic meaning and structural relationships.

The Multi-Model Advantage

Most AI memory systems treat graphs and vectors as separate concerns with separate databases. This creates real problems:

Data synchronization. Entities in the graph must stay in sync with their vector representations. Two databases means two sources of truth.
Operational complexity. Two databases to deploy, scale, back up, and monitor. Two sets of connection pools, credentials, and failure modes.
Query-time overhead. A search that needs both graph context and vector similarity requires two round-trips to two different systems.

ArcadeDB eliminates this split. A single node in ArcadeDB can be a graph vertex with edges to other entities and carry a vector embedding for similarity search and store document properties — all queryable in a single query. This is what multi-model means in practice: not just supporting multiple APIs, but storing and querying multiple data representations in a single, consistent engine.

For Cognee’s architecture specifically, this means the knowledge graph and the vector index live in the same database, on the same data. No synchronization layer. No eventual consistency between two systems. One transactional engine.

What’s Next

The ArcadeDB adapter for Cognee is available today as a community package. We’re working with the Cognee team to:

Expand the integration to cover ArcadeDB’s vector search capabilities directly within the Cognee pipeline
Optimize graph construction queries for ArcadeDB’s native traversal performance
Make ArcadeDB a first-class backend option in Cognee’s documentation and getting started guides

If you’re building AI agents that need structured, persistent memory — or if you’re looking for a single database to replace a graph DB + vector store combination for GraphRAG — give ArcadeDB + Cognee a try.

Get started:

Declarative Graph Importer: Import StackOverflow into a Graph with a Single JSON File

2026-03-28T00:00:00+00:00

Importing a real-world dataset into a graph database usually means writing a custom ETL script: parse the files, resolve foreign keys, batch your transactions, handle edge cases. It works, but it’s tedious, error-prone, and you end up throwing away the script once the import is done.

ArcadeDB v26.3.2 introduces the GraphImporter — a declarative tool that turns CSV, XML, and JSONL files into a fully connected graph using nothing but a JSON configuration file. No code, no custom scripts. Under the hood it uses the GraphBatch engine for maximum throughput.

Let’s see how it works by importing a real dataset: the StackOverflow data dump.

The StackOverflow Graph Model

The StackOverflow data dump is a classic dataset for benchmarking and graph analysis. It ships as a set of XML files, each representing a table in the original relational schema. Here’s the graph model we’ll build:

            ASKED                    TAGGED_WITH
  User ──────────────> Question ──────────────> Tag
    │                   │    ^
    │ ANSWERED           │    │ HAS_ANSWER
    v                   │    │
  Answer <──────────────┘    │
    ^                        │
    │ ACCEPTED_ANSWER        │
    └────────────────────────┘

  User ──WROTE_COMMENT──> Comment ──COMMENTED_ON──> Question/Answer
  User ──EARNED──> Badge
  Question ──LINKED_TO──> Question

Six vertex types, eight edge types, all derived from six XML files. Let’s see how to express this as a single JSON configuration.

The Import Configuration

Here’s the complete JSON file that defines the entire import:

{
  "vertices": [
    {
      "type": "Tag", "file": "Tags.xml", "id": "Id", "nameId": "TagName",
      "properties": { "Id": "int:Id", "TagName": "TagName", "Count": "int:Count" }
    },
    {
      "type": "User", "file": "Users.xml", "id": "Id",
      "properties": {
        "Id": "int:Id", "DisplayName": "DisplayName", "Reputation": "int:Reputation",
        "CreationDate": "CreationDate", "Views": "int:Views",
        "UpVotes": "int:UpVotes", "DownVotes": "int:DownVotes"
      }
    },
    {
      "type": "Question", "file": "Posts.xml", "id": "Id", "filter": "PostTypeId=1",
      "properties": {
        "Id": "int:Id", "Title": "Title", "Body": "Body", "Score": "int:Score",
        "ViewCount": "int:ViewCount", "CreationDate": "CreationDate",
        "AnswerCount": "int:AnswerCount", "CommentCount": "int:CommentCount", "Tags": "Tags"
      },
      "edges": [
        { "attribute": "OwnerUserId", "edge": "ASKED", "target": "User", "direction": "in" },
        { "attribute": "Tags", "edge": "TAGGED_WITH", "target": "Tag", "split": "|" }
      ]
    },
    {
      "type": "Answer", "file": "Posts.xml", "id": "Id", "filter": "PostTypeId=2",
      "properties": {
        "Id": "int:Id", "Body": "Body", "Score": "int:Score",
        "CreationDate": "CreationDate", "CommentCount": "int:CommentCount"
      },
      "edges": [
        { "attribute": "OwnerUserId", "edge": "ANSWERED", "target": "User", "direction": "in" },
        { "attribute": "ParentId", "edge": "HAS_ANSWER", "target": "Question", "direction": "in" }
      ]
    },
    {
      "type": "Comment", "file": "Comments.xml", "id": "Id",
      "properties": {
        "Id": "int:Id", "Score": "int:Score", "CreationDate": "CreationDate", "Text": "Text"
      },
      "edges": [
        { "attribute": "PostId", "edge": "COMMENTED_ON", "target": "Question" },
        { "attribute": "PostId", "edge": "COMMENTED_ON_ANSWER", "target": "Answer" },
        { "attribute": "UserId", "edge": "WROTE_COMMENT", "target": "User", "direction": "in" }
      ]
    },
    {
      "type": "Badge", "file": "Badges.xml", "id": "Id",
      "properties": {
        "Id": "int:Id", "Name": "Name", "Date": "Date",
        "BadgeClass": "int:Class", "TagBased": "bool:TagBased"
      },
      "edges": [
        { "attribute": "UserId", "edge": "EARNED", "target": "User", "direction": "in" }
      ]
    }
  ],

  "edgeSources": [
    {
      "edge": "ACCEPTED_ANSWER", "file": "Posts.xml",
      "from": "Id:Question", "to": "AcceptedAnswerId:Answer"
    },
    {
      "edge": "LINKED_TO", "file": "PostLinks.xml",
      "from": "PostId:Question", "to": "RelatedPostId:Question",
      "properties": { "LinkType": "int:LinkTypeId" }
    }
  ],

  "postImportCommands": [
    {
      "language": "sql",
      "command": "CREATE GRAPH ANALYTICAL VIEW IF NOT EXISTS stackoverflow PROPERTIES (`!Body`, `!Text`) UPDATE MODE SYNCHRONOUS"
    }
  ]
}

That’s it. Six vertex types, eight edge types, post-import Graph Analytical View — all in one file. Let’s break down the key patterns.

Key Patterns Explained

Splitting One File into Multiple Vertex Types

StackOverflow stores both questions and answers in the same Posts.xml file, distinguished by PostTypeId. The filter option lets you import them as separate vertex types:

{ "type": "Question", "file": "Posts.xml", "filter": "PostTypeId=1", ... }
{ "type": "Answer",   "file": "Posts.xml", "filter": "PostTypeId=2", ... }

The importer reads Posts.xml once per definition, but only creates vertices for rows matching the filter. This is a common pattern when a single source table contains multiple logical entity types.

Foreign Key Resolution

Most edges are derived from foreign key attributes in the source data. The importer needs to know two things: which attribute holds the foreign key, and which vertex type it references.

Outgoing edges — the foreign key is in this vertex’s source, pointing to the target:

{ "attribute": "ParentId", "edge": "HAS_ANSWER", "target": "Question", "direction": "in" }

This means: “read ParentId from each Answer row, find the Question with that ID, and create a HAS_ANSWER edge from the Question to this Answer”. The "direction": "in" flips the edge so the Question is the source (the question has an answer, not the other way around).

Default direction is "out" — the current vertex is the edge source:

{ "attribute": "PostId", "edge": "COMMENTED_ON", "target": "Question" }

This creates an edge from Comment to Question.

Split-Field Edges (Multi-Value Attributes)

StackOverflow stores tags as a single delimited string like . The split option expands this into multiple edges:

{ "attribute": "Tags", "edge": "TAGGED_WITH", "target": "Tag", "split": "|" }

For a question tagged |java|python|sql|, this creates three TAGGED_WITH edges — one to each Tag vertex. The split values are resolved using the target’s nameId attribute (in this case, TagName), not the integer id.

Edge-Only Sources

Some relationships live in their own source file rather than as foreign keys in a vertex file. The edgeSources section handles these:

{
  "edge": "LINKED_TO", "file": "PostLinks.xml",
  "from": "PostId:Question", "to": "RelatedPostId:Question",
  "properties": { "LinkType": "int:LinkTypeId" }
}

The compact "attribute:vertexType" syntax tells the importer which attribute to read and which vertex type to resolve against. Both endpoints must already exist (vertex sources are processed first).

Property Type Mapping

Properties are strings by default. Prefix the source attribute with a type hint for automatic conversion:

Syntax	Type	Example
`"DisplayName"`	String	`"name": "DisplayName"`
`"int:Score"`	Integer	`"score": "int:Score"`
`"bool:TagBased"`	Boolean	`"tagBased": "bool:TagBased"`

Post-Import Commands

The postImportCommands array runs SQL (or any supported language) after the import completes. In this example, we create a Graph Analytical View that pre-computes the graph structure for fast OLAP queries, excluding large text properties (Body, Text) to keep the view compact:

{
  "language": "sql",
  "command": "CREATE GRAPH ANALYTICAL VIEW IF NOT EXISTS stackoverflow PROPERTIES (`!Body`, `!Text`) UPDATE MODE SYNCHRONOUS"
}

Running the Import

From the Command Line

java com.arcadedb.integration.importer.graph.GraphImporter \
    stackoverflow-import.json \
    /path/to/database \
    /path/to/stackoverflow-data

The importer auto-creates the schema (vertex and edge types) from the JSON config, runs the two-pass import, and executes post-import commands. File paths in the JSON are resolved relative to the data directory (third argument).

From Java

Database database = new DatabaseFactory("/path/to/database").create();

String json = Files.readString(Path.of("stackoverflow-import.json"));
GraphImporter.createSchemaFromConfig(database, new JSONObject(json));

try (GraphImporter importer = GraphImporter.fromJSON(database, json, "/path/to/data")) {
    importer.run();
    System.out.printf("Vertices: %,d  Edges: %,d%n",
        importer.getVertexCount(), importer.getEdgeCount());
}

GraphImporter.executePostImportCommands(database, new JSONObject(json));

Programmatic Builder API

If you prefer code over JSON, the same import can be expressed with the builder:

GraphImporter.builder(database)
    .vertex("Tag", XmlRowSource.from(dataDir, "Tags.xml"), v -> {
        v.id("Id");
        v.idByName("TagName");
        v.property("TagName", "TagName");
        v.intProperty("Count", "Count");
    })
    .vertex("User", XmlRowSource.from(dataDir, "Users.xml"), v -> {
        v.id("Id");
        v.property("DisplayName", "DisplayName");
        v.intProperty("Reputation", "Reputation");
    })
    .vertex("Question", XmlRowSource.from(dataDir, "Posts.xml"), v -> {
        v.filter("PostTypeId", "1");
        v.id("Id");
        v.property("Title", "Title");
        v.intProperty("Score", "Score");
        v.edgeIn("OwnerUserId", "ASKED", "User");
        v.splitEdge("Tags", "TAGGED_WITH", "Tag", "|");
    })
    // ... remaining vertex and edge sources
    .build()
    .run();

How It Works Under the Hood

The GraphImporter uses a two-pass, CSR-first (Compressed Sparse Row) architecture:

Pass 1 — Vertices and topology collection. Each data source is read once. Vertices are created with full properties and flushed to disk immediately. Foreign key values are collected as compressed primitive arrays (int arrays for IDs, bucket/position pairs for RIDs) — no objects, no boxing, minimal GC pressure.

Pass 2 — Edge creation. The collected topology is fed into GraphBatch, which creates all edges with bidirectional traversal support. Each edge type is processed as a single batch for maximum sequential I/O.

This design means vertex data doesn’t stay in memory — only the graph topology does. For a dataset with 8 million vertices and 15 million edges, the in-memory topology is roughly 300 MB.

Supported Data Sources

Format	Auto-detected	Notes
CSV	`.csv`	Configurable delimiter (`"delimiter": ","`) and skip lines (`"skipLines": 1`)
JSONL	`.jsonl`, `.ndjson`	One JSON object per line
XML	`.xml`	Attribute-based by default (StackOverflow-style ). Set `"element": "book"` for child-element parsing

All sources are streamed — the importer never loads an entire file into memory.

Configuration Reference

Vertex Source

Field	Required	Description
`type`	Yes	ArcadeDB vertex type name (auto-created if missing)
`file`	Yes	Source file path, relative to the data directory
`id`	No	Integer primary key attribute for edge resolution
`nameId`	No	String-based secondary key (for split-field edge resolution)
`filter`	No	Row filter: `"attribute=value"` — only matching rows are imported
`properties`	No	Map of `"dbPropertyName": "SourceAttr"` (or `"int:Attr"`, `"bool:Attr"`)
`edges`	No	Array of edge definitions derived from foreign keys in this source

Edge Definition (inside a vertex source)

Field	Required	Description
`attribute`	Yes	Source attribute containing the foreign key value
`edge`	Yes	ArcadeDB edge type name (auto-created if missing)
`target`	Yes	Target vertex type the foreign key references
`direction`	No	`"out"` (default) or `"in"` — controls edge direction
`split`	No	Delimiter for multi-value fields (creates one edge per value)

Edge-Only Source

Field	Required	Description
`edge`	Yes	ArcadeDB edge type name
`file`	Yes	Source file path
`from`	Yes	`"attribute:vertexType"` — source vertex reference
`to`	Yes	`"attribute:vertexType"` — target vertex reference
`properties`	No	Map of `"dbPropertyName": "int:SourceAttr"`

Get Started

The GraphImporter is available starting from ArcadeDB v26.3.2. Download the StackOverflow data dump, grab the JSON config above, and you’ll have a fully connected graph in minutes.

Download ArcadeDB v26.3.2: GitHub Releases

If you have questions or feedback, join us on Discord or open an issue on GitHub.

Graph OLAP Engine: The Fastest Graph Analytics with Zero Compromises

2026-03-25T00:00:00+00:00

ArcadeDB has always been the fastest OLTP graph database. But we asked ourselves: what if we could make analytical queries — PageRank, connected components, multi-hop traversals — up to 462x faster, without giving up a single byte of transactional performance?

With ArcadeDB v26.3.2, we’re introducing the Graph OLAP Engine — and the answer is yes. Zero compromises.

The Problem with OLTP for Analytics

ArcadeDB’s OLTP engine is built for speed: point lookups, single-record mutations, ACID transactions — it handles all of this faster than any other graph database. But analytical workloads are a different beast. When you run PageRank or community detection, you’re accessing millions of edges in tight loops. The row-oriented, pointer-chasing nature of OLTP storage hits three walls:

Cache misses: every edge lookup follows a RID pointer to a different page in memory
Object overhead: every vertex and edge is a Java object with ~48 bytes of overhead
GC pressure: traversals create millions of short-lived objects that hammer the garbage collector

These are fundamental limitations of any OLTP storage engine, not just ours. The traditional answer has been to export your data to a separate analytics system. We think that’s unacceptable.

Graph Analytical Views: OLAP That Lives Inside Your Database

The Graph OLAP Engine introduces Graph Analytical Views (GAV) — a read-optimized, columnar representation of your graph that lives alongside your live OLTP data. You create a view, and the engine keeps it synchronized with every write.

Here’s how simple it is:

CREATE GRAPH ANALYTICAL VIEW social
  VERTEX TYPES (Person, Company)
  EDGE TYPES (FOLLOWS, WORKS_AT)
  PROPERTIES (name, age, status)
  UPDATE MODE SYNCHRONOUS

Or if you want a view over your entire graph:

CREATE GRAPH ANALYTICAL VIEW fullGraph

That’s it. From this moment on, your Cypher and SQL queries are automatically accelerated by the OLAP engine. No query changes needed — the optimizer detects the GAV and transparently substitutes OLTP traversal operators with CSR-based ones.

How It Works Under the Hood

Compressed Sparse Row (CSR) Encoding

Instead of pointer-chasing through pages, the OLAP engine encodes your entire graph topology as flat int[] arrays using CSR encoding:

Forward CSR (outgoing edges):
  offsets:   [0, 3, 5, 8, ...]     ← one entry per vertex + sentinel
  neighbors: [1, 5, 7, 2, 6, ...]  ← dense neighbor IDs, contiguous per source

  Neighbors of vertex v = neighbors[offsets[v] .. offsets[v+1])
  Out-degree of vertex v = offsets[v+1] - offsets[v]   ← O(1)

Both forward (OUT) and backward (IN) CSR indexes are maintained for bidirectional traversal. This layout is cache-line friendly — sequential memory access means the CPU prefetcher does the work for you. Zero object allocation during traversal. SIMD-friendly vectorized operations.

Columnar Property Storage

Properties are stored as typed flat arrays — int[], long[], double[], or dictionary-encoded int[] for strings. Each column has a compact null bitmap using just 1 bit per vertex. Dictionary encoding achieves near-100% compression for low-cardinality fields like status, category, or tag.

Three Synchronization Modes

The key design decision was making the OLAP engine coexist with OLTP, not replace it. You choose how writes propagate:

Mode	Behavior	Staleness	Best For
SYNCHRONOUS	Applies overlay on each commit	Zero — writes reflected immediately	Real-time analytics
ASYNCHRONOUS	Background rebuild on commit	Brief window during rebuild	Large graphs, eventual consistency
OFF	Manual rebuild only	Until you rebuild	Batch analytics, static snapshots

In SYNCHRONOUS mode, the engine captures transaction deltas and merges them into an immutable overlay on top of the base CSR. Readers always see a consistent snapshot via volatile reference swap. When the overlay grows past a configurable threshold (default: 10,000 edges), a background compaction rebuilds the full CSR — transparently, with no downtime.

The Benchmarks

Internal Benchmark: OLTP vs OLAP

Graph: 500K vertices, ~8M edges

Benchmark	ArcadeDB OLTP	ArcadeDB OLAP	Speedup
1-hop count	3.0 µs	0.8 µs	3.8x
1-hop IDs	5.0 µs	1.0 µs	5.1x
2-hop	31.3 µs	3.3 µs	9.4x
3-hop	418.3 µs	35.1 µs	11.9x
4-hop	6,089 µs	390.1 µs	15.6x
5-hop	92,497 µs	2,765 µs	33.5x
Shortest Path	165 ms/pair	3.1 ms/pair	54.0x
PageRank (20 iter)	54,094 ms	117 ms	462.3x
Connected Components	2,238 ms	60 ms	37.3x
Label Propagation	33,450 ms	142 ms	235.6x

Benchmarked on a MacBook Pro M5 Pro (2026), 48 GB RAM, 1 TB disk.

The OLAP engine dominates across the board, especially on full-graph algorithms. PageRank goes from nearly a minute to 117 milliseconds — a 462.3x speedup. Connected Components is 37.3x faster, and Label Propagation 235.6x faster. The deeper the traversal, the bigger the advantage.

LDBC Graphalytics: ArcadeDB vs the Competition

We ran the standard LDBC Graphalytics benchmark framework against other graph databases. Results include both embedded mode (in-process Java) and Docker container (same conditions as Neo4j, Memgraph, FalkorDB, and HugeGraph). The results speak for themselves:

Algorithm	ArcadeDB Embedded	ArcadeDB Docker	Neo4j 2026	Kuzu	DuckPGQ	ArangoDB	FalkorDB	HugeGraph	Winner
Load	55.34s	35.30s	653.41s	3.56s	0.35s	352.47s	1568.36s	15.27s	DuckPGQ
PageRank	0.10s	0.40s	3.50s	0.97s	0.83s	47.58s	0.70s	1.23s	ArcadeDB
WCC	0.08s	0.17s	0.32s	0.10s	1.95s	25.83s	0.58s	2.00s	ArcadeDB
BFS	0.09s	0.30s	0.58s	0.29s	N/A	N/A	0.05s	0.17s	FalkorDB
LCC	2.35s	2.51s	15.47s	N/A	10.79s	N/A	N/A	122.44s	ArcadeDB
SSSP	0.92s	0.41s	N/A	N/A	N/A	33.36s	N/A	N/A	ArcadeDB
CDLP	1.11s	1.35s	N/A	N/A	N/A	126.63s	1.58s	23.88s	ArcadeDB

ArcadeDB leads on 5 out of 6 algorithms — both in embedded mode and as a Docker container. The only exception is BFS, where FalkorDB edges ahead (0.05s vs 0.09s). Even running as a Docker container — with network serialization overhead, HTTP API, and Docker Desktop’s VM layer — ArcadeDB remains the fastest on every other algorithm. The Docker results are measured warm (JIT-compiled), matching how production servers run. Results are fully reproducible — see the benchmark project on GitHub.

Memgraph crashed on connected components. Investigating further, we found 50 open issues on their GitHub reporting random crashes triggered even by simple queries, some of which have remained unaddressed for over 3 years.

LSQB: Subgraph Pattern Matching

We also ran the LSQB (Labelled Subgraph Query Benchmark) — a microbenchmark from the LDBC council that focuses on subgraph pattern matching: counting how many times a given labelled graph pattern appears in a dataset. It tests multi-way joins, anti-patterns (NOT EXISTS), and complex multi-hop chains using 9 Cypher queries on the LDBC SNB social network (SF1: ~3.9M vertices, ~17.9M edges).

We compared ArcadeDB against 7 other systems — graph databases (Kuzu, Neo4j, Memgraph, Dgraph), relational engines (DuckDB, PostgreSQL), and ArcadeDB itself as a Docker container. All Docker-based systems run under the same conditions (Docker Desktop for macOS):

Query	Pattern	Expected Count	ArcadeDB Embedded	ArcadeDB Docker	DuckDB	Kuzu	Neo4j	PostgreSQL	Memgraph	Dgraph	Winner
Q1	8-hop chain	221,636,419	0.23s	0.25s	0.15s	5.83s	8.25s	6.56s	60.45s	2.52s	DuckDB
Q2	Diamond	1,085,627	0.18s	0.19s	0.02s	0.14s	2.06s	0.34s	timeout	N/A	DuckDB
Q3	Triangle	753,570	0.10s	0.13s	0.05s	2.44s	14.31s	2.12s	timeout	N/A	DuckDB
Q4	Star join	14,836,038	0.03s	0.03s	0.08s	N/A	7.82s	6.86s	4.50s	8.13s	ArcadeDB
Q5	Fork	13,824,510	0.29s	0.23s	0.04s	N/A	6.72s	0.69s	3.86s	N/A	DuckDB
Q6	2-hop traversal	1,668,134,320	0.11s	0.11s	2.18s	1.41s	52.06s	17.72s	148.14s	N/A	ArcadeDB
Q7	Star (optional)	26,190,133	0.09s	0.02s	0.08s	N/A	10.45s	11.22s	5.59s	5.97s	ArcadeDB
Q8	Anti-pattern	6,907,213	0.19s	0.19s	0.07s	N/A	12.91s	1.31s	3.37s	N/A	DuckDB
Q9	Anti-pattern + traversal	1,596,153,418	1.18s	1.06s	7.77s	6.15s	59.09s	22.25s	timeout	N/A	ArcadeDB

All 9 queries produce correct results matching the official LSQB expected output. Kuzu skips Q4/Q5/Q7/Q8 (no :Message supertype support). Memgraph times out on Q2/Q3/Q9 (600s limit).

ArcadeDB is the fastest system on 4 out of 9 queries (Q4, Q6, Q7, Q9). Q4 and Q7 are star-shaped joins centered on a Message node — with the GAV’s CSR acceleration, ArcadeDB completes these in 10–30ms, 3–8x faster than DuckDB and 261–1045x faster than Neo4j. Q6 and Q9 are multi-hop path traversals where ArcadeDB is 7–20x faster than DuckDB, 55–473x faster than Neo4j, and 21–161x faster than PostgreSQL. Q6 in particular showcases the edge-scan algebraic optimization: ArcadeDB computes the 1.67-billion-row count in just 110ms — 20x faster than DuckDB.

Where DuckDB wins: The remaining queries (Q1, Q2, Q3, Q5, Q8) are join-intensive patterns — long chains, diamonds, forks, and anti-patterns — where DuckDB’s columnar vectorized execution excels. However, the gap has narrowed significantly: Q8 is now only 2.7x slower than DuckDB (down from 7.7x), thanks to the edge-scan anti-join optimization.

ArcadeDB Docker vs other Docker systems (apples-to-apples): Even with HTTP + network + Docker VM overhead, ArcadeDB Docker is 10–1045x faster than Neo4j, 2–24x faster than PostgreSQL, and 5–559x faster than Memgraph on the queries Memgraph completes. The Docker overhead is minimal (0.01–1.08s) because the GAV/CSR does the heavy lifting, not the transport.

The other graph databases: Neo4j is 10–1045x slower than ArcadeDB on every query. Memgraph times out on 3 queries and is 5–559x slower on the ones it completes. Kuzu can’t run 4 of 9 queries due to missing type hierarchy support, and is 2–28x slower than ArcadeDB on most of the rest (though Kuzu edges ahead on Q2 at 0.14s vs 0.18s). PostgreSQL is faster than all other graph databases but still 2–161x slower than ArcadeDB.

Dgraph v25.3.0 has no native pattern matching language (DQL is a hierarchical traversal language, not Cypher/SQL). Through creative use of DQL value-variable propagation and math(), we managed to express 3 of 9 queries — but Dgraph is 11x slower than ArcadeDB on Q1 (2.52s vs 0.23s), 271x slower on Q4 (8.13s vs 0.03s), and 597x slower on Q7 (5.97s vs 0.01s). The remaining 6 queries are fundamentally impossible in DQL (no JOINs, no self-joins, no anti-joins).

SurrealDB v2.6.4 fares even worse: queries are written for 5 of 9 patterns using nested subqueries with $parent dereferencing, but every single one times out at 120 seconds. The O(n*m) nested loop execution without index acceleration is simply too slow for 3.9M vertices. The remaining 4 queries cannot be expressed in SurrealQL at all (no table aliases for self-joins).

The takeaway: ArcadeDB’s CSR engine beats every other graph database on every single query except Q2 where Kuzu is marginally faster — both embedded and as a Docker container — and beats DuckDB on the graph-shaped queries. Databases that claim “graph capabilities” (Dgraph, SurrealDB) can barely express these patterns, let alone execute them competitively. And unlike DuckDB, ArcadeDB gives you ACID transactions, persistence, concurrent access, and a full graph query language on top.

Memory: Compact by Design

You might expect an OLAP layer to be a memory hog. The opposite is true. For the 500K vertex / 8M edge benchmark graph:

GAV (OLAP): 138.4 MB
OLTP estimate: ~1.2 GB
Compression ratio: 9.0x more compact

The CSR encoding uses ~8 bytes per edge, node ID mapping takes ~8 bytes per vertex, and columnar properties use 4–8 bytes per vertex per column. String properties are dictionary-encoded, and null bitmaps cost just 1 bit per vertex per column.

In practice, enabling Graph OLAP adds a fraction of the memory your OLTP data already uses.

70 Built-in Graph Algorithms — All OLAP-Optimized

All 70 graph algorithms ship fully optimized for the Graph OLAP Engine, operating directly on CSR arrays with zero GC pressure and multi-threaded execution. When a GAV is available, every algorithm automatically uses the OLAP path — no configuration needed.

Category	Algorithms
Path Finding	Shortest Path, A*, Bellman-Ford, Dijkstra, Dijkstra Single Source, Duan SSSP, All Pairs Shortest Path (APSP), All Simple Paths, K Shortest Paths, Longest Path DAG, Steiner Tree
Traversal	BFS, DFS
Centrality	Degree, Closeness, Betweenness, Eigenvector, Harmonic, Eccentricity, HITS, Katz, ArticleRank
Ranking	PageRank, Personalized PageRank, VoteRank
Community Detection	Label Propagation, Louvain, Leiden, Strongly Connected Components (SCC), Weakly Connected Components (WCC), Biconnected Components, SLPA
Link Prediction	Adamic-Adar, Common Neighbors, Jaccard Similarity, Preferential Attachment, Resource Allocation, SimRank
Clustering & Partitioning	Local Clustering Coefficient, Triangle Count, Hierarchical Clustering, K-Means, Graph Coloring, Bipartite Check, Bipartite Matching
Subgraph Analysis	Clique, K-Core, K-Truss, Densest Subgraph, Articulation Points, Bridges
Spanning Trees	Minimum Spanning Tree (MST), Min Spanning Arborescence
Network Flow	Max Flow, Max K-Cut
Graph Metrics	Assortativity, Conductance, Modularity Score, Rich Club, Graph Summary, Total Neighbors, Same Community
ML & Embeddings	Random Walk, Node2Vec, FastRP, GraphSAGE, HashGNN, Influence Maximization
Other	Cycle Detection, Topological Sort

Getting Started

SQL

-- Create a view over your social graph
CREATE GRAPH ANALYTICAL VIEW social
  VERTEX TYPES (Person, Company)
  EDGE TYPES (FOLLOWS, WORKS_AT)
  PROPERTIES (name, age, status)
  UPDATE MODE SYNCHRONOUS

-- That's it. Your Cypher queries are now accelerated automatically.
-- You can also use edge properties for weighted algorithms:
CREATE GRAPH ANALYTICAL VIEW weighted
  VERTEX TYPES (City)
  EDGE TYPES (ROAD)
  EDGE PROPERTIES (distance, toll)
  UPDATE MODE SYNCHRONOUS

Java API

GraphAnalyticalView gav = GraphAnalyticalView.builder(database)
    .withName("social")
    .withVertexTypes("Person", "Company")
    .withEdgeTypes("FOLLOWS", "WORKS_AT")
    .withProperties("name", "age", "status")
    .withUpdateMode(UpdateMode.SYNCHRONOUS)
    .build();

// Run algorithms directly on the OLAP engine
GraphAlgorithms algos = new GraphAlgorithms();
double[] ranks = algos.pageRank(gav, 20, 0.85);
int[] components = algos.connectedComponents(gav);

Managing Views

-- Change update mode on the fly
ALTER GRAPH ANALYTICAL VIEW social UPDATE MODE ASYNCHRONOUS

-- Force a rebuild
REBUILD GRAPH ANALYTICAL VIEW social

-- Drop when no longer needed
DROP GRAPH ANALYTICAL VIEW social

Named views are persisted in the schema and automatically restored on database restart.

The Zero-Compromise Philosophy

Most databases force you to choose: fast transactions or fast analytics. Export your data to a separate system, maintain two clusters, deal with synchronization lag.

ArcadeDB’s Graph OLAP Engine rejects that tradeoff entirely:

Your OLTP engine stays exactly as it is — same speed, same ACID guarantees, same API
Turn on a GAV, and analytical queries get up to 462x faster automatically
Synchronization is configurable — real-time, async, or manual, your choice
Memory overhead is minimal — the OLAP representation is 9.0x more compact than OLTP
No query changes — the optimizer handles everything transparently

We were already the fastest OLTP graph database. Now we’re the fastest OLAP graph database too.

The Graph OLAP Engine is available from ArcadeDB v26.3.2. For detailed documentation, visit docs.arcadedb.com.

Try ArcadeDB: GitHub

Docker Hub

Documentation

ArcadeDB 26.3.2: Graph OLAP Engine, Batch Ingestion & More

2026-03-24T00:00:00+00:00

We’re excited to announce the release of ArcadeDB 26.3.2, a performance-focused release with 100+ commits and 21 closed issues. This release introduces a powerful Graph Analytical View for OLAP workloads, ultra-fast bulk edge creation, new batch APIs, and much more.

Major New Features

Graph Analytical View (GAV)

A CSR (Compressed Sparse Row) based OLAP acceleration layer for dramatically faster graph analytics. Automatically used by SQL and OpenCypher query planners. Includes build-probe hash join, count push-down, sorted neighbor lists for merge-join, anti-join with binary search, and automatic async rebuild on mutations.

High-Performance Bulk Edge Creation (GraphBatch)

Ultra-fast bulk edge creation with parallel writes, parallel sorting, and a new database.batch() API for high-throughput graph ingestion scenarios.

HTTP Batch Endpoint

New /batch HTTP endpoint for executing multiple operations in a single request, reducing round-trip overhead for bulk workloads.

gRPC GraphBatchLoad RPC

Client-streaming RPC for high-throughput bulk graph loading via gRPC, complementing the HTTP batch endpoint.

GraphQL Introspection

Full GraphQL introspection support, enabling tools and IDEs to automatically discover schema, queries, and mutations.

MCP Stdio Transport

Direct integration with AI assistants via stdio protocol, alongside the existing SSE transport. Also includes profiler/server settings in MCP tools and fixed MCP authentication with API tokens.

Auto-tune maxPageRAM

Automatic detection of container memory limits at startup for better Docker and Kubernetes defaults out of the box.

Bug Fixes & Improvements

SQL Engine

Fixed count(*) on empty types
Fixed CONTAINSANY regression
Fixed $current null handling
Additional SQL engine stability fixes

OpenCypher

Fixed UNWIND/WHERE push-down
Fixed edge creation issues

Wire Protocols

Fixed Bolt parameterized queries
Fixed gRPC language parameter
Fixed Gremlin defaults

Core Engine

Fixed core concurrency issues
Fixed vector index performance regressions
Fixed HTTP 413 handling
Fixed PageRank direction bug
Addressed Java 25 warnings

Performance Improvements

OLTP graph traversal optimizations
Hash join improvements
Count push-down optimization
Single-pass BFS count propagation on CSR
Optimized edge insertion

Upgraded Dependencies

Gremlin 3.7.5 → 3.8.0
Neo4j Java Driver 5.28.10 → 6.0.3
Groovy 4.0.28 → 5.0.4
JVector 4.0.0-rc.7 → rc.8
30+ minor dependency updates

Getting Started with 26.3.2

Docker

Pull the latest image from Docker Hub:

docker pull arcadedata/arcadedb:26.3.2

Visit our Docker Hub repository for more information.

Maven

Add ArcadeDB to your Java projects:

    com.arcadedb
    arcadedb-engine
    26.3.2

All artifacts are available on Maven Central.

Documentation

For detailed information on features and usage, refer to our comprehensive documentation.

Compatibility Note

This release maintains 100% compatibility with previous database formats, meaning no export/import is required when upgrading. However, we always recommend creating a database backup before upgrading to a new version.

This release focuses on making ArcadeDB faster than ever for graph analytics and bulk data ingestion workloads.

Download ArcadeDB 26.3.2 now: GitHub Releases

For detailed documentation and getting started guides, visit docs.arcadedb.com.

Neo4j Alternatives in 2026: A Fair Look at the Open-Source Options

2026-03-13T00:00:00+00:00

The best open-source Neo4j alternatives in 2026 are ArcadeDB, Memgraph, FalkorDB, ArangoDB, HugeGraph, and LadybugDB. Below we compare each graph database on licensing, performance benchmarks, multi-model support, and AI readiness — with honest pros and cons for every option.

If you’re searching for a Neo4j alternative — or a full Neo4j replacement — you’ve probably noticed a pattern: every graph database comparison article is written by a vendor, and — surprise — that vendor always comes out on top.

We’re not going to pretend we’re different. This article is published on the ArcadeDB blog, and yes, we think ArcadeDB is the best graph database in 2026 for most use cases. But here’s what we will do differently: we’ll only compare databases that are actually available in 2026, we’ll be transparent about licensing (because “open-source” doesn’t mean what some vendors want you to think it means), and we’ll acknowledge where each product genuinely shines.

No defunct products. No proprietary databases pretending to be open-source. Just a fair graph database comparison of what’s actually out there.

What Makes a Real Neo4j Alternative?

Before diving in, let’s define the criteria. A credible Neo4j alternative in 2026 should be:

Actively maintained — regular releases, responsive community, not abandoned or in corporate limbo
Genuinely open-source or source-available — with an honest description of what the license actually allows
Graph-native — not a relational database with a graph extension bolted on
Production-ready — ACID transactions, persistence, security, and scalability
Standards-compatible — supporting established query languages (SQL, Cypher, Gremlin, or GraphQL)

We’ll also evaluate each database on multi-model capabilities, AI/agent readiness (MCP, vector search, embeddings), and total cost of ownership — because the sticker price is only part of the story.

1. ArcadeDB

License: Apache 2.0 (fully open-source, no restrictions, no data caps, forever) Query Languages: SQL, Cypher, Gremlin, GraphQL, MQL (MongoDB-compatible) Data Models: Graph, Document, Key-Value, Time-Series, Vector, Search Written in: Java Persistence: Disk-based with in-memory caching

Why It Stands Out

ArcadeDB is the only graph database that supports SQL, Cypher, Gremlin, GraphQL, and MongoDB query API under a single Apache 2.0 license. With five query languages and six data models in one engine, you can query the same data as a graph with Cypher, as documents with SQL, and through a MongoDB-compatible API — all without data duplication or ETL pipelines.

Licensing clarity. ArcadeDB uses Apache 2.0 — the most permissive license in the open-source world. No BSL, no SSPL, no “community edition” with artificial caps. You can use it commercially, embed it, modify it, and distribute it without paying a cent or asking permission. We’ve publicly committed to never changing this license.

AI and agent readiness. ArcadeDB ships with a built-in MCP server that lets LLMs and AI agents query your database directly using the Model Context Protocol. It also supports vector search natively — no plugins, no separate infrastructure.

Performance. ArcadeDB can ingest over 2 million records per second on commodity hardware and handle complex multi-hop graph traversals efficiently thanks to its native graph engine. With the new Graph OLAP Engine, ArcadeDB leads on every algorithm in the LDBC Graphalytics benchmark — both in embedded mode (in-process Java) and as a Docker container (same conditions as Neo4j, Memgraph, FalkorDB, and HugeGraph):

Algorithm	ArcadeDB	ArcadeDB Docker	Neo4j 2026	Kuzu	DuckPGQ	Memgraph	ArangoDB	FalkorDB	HugeGraph
PageRank	0.48s	0.83s	11.15s	4.30s	6.14s	16.90s	157.01s	1.67s	4.01s
WCC	0.30s	0.22s	0.75s	0.43s	13.93s	crash	78.03s	0.85s	6.71s
BFS	0.13s	0.07s	1.91s	0.86s	2,754s	11.72s	511.55s	0.20s	0.54s
LCC	27.41s	34.98s	45.78s	N/A	38.59s	N/A	N/A	N/A	272.04s
SSSP	3.53s	0.97s	N/A	N/A	N/A	N/A	301.93s	N/A	N/A
CDLP	3.67s	3.35s	6.43s	N/A	N/A	N/A	407.41s	5.38s	62.70s

Even running as a Docker container — with network serialization overhead, HTTP API, and Docker Desktop’s VM layer — ArcadeDB is still the fastest on every algorithm. The Docker results are measured warm (JIT-compiled), matching how production servers run. Results are fully reproducible — see the benchmark project on GitHub.

PageRank Benchmark

Execution time in seconds — lower is better

ArcadeDB

0.48s

ArcadeDB (Docker)

0.83s

FalkorDB

1.67s

HugeGraph

4.01s

Kuzu

4.30s

DuckPGQ

6.14s

Neo4j 2026

11.15s

Memgraph

16.90s

ArangoDB

157.01s

0s 40s 80s 120s 157s

Neo4j vs ArcadeDB

The core difference comes down to philosophy: Neo4j is a graph-only database with proprietary enterprise features, while ArcadeDB is a multi-model engine that treats graph as one of six natively supported data models. For teams evaluating a Neo4j migration, ArcadeDB offers the smoothest path — your Cypher queries work as-is (97.8% TCK compliance), the BOLT protocol is supported, and there’s a built-in Neo4j importer. You gain multi-model capabilities without giving up graph performance.

Where It’s Not the Best Fit

Pure in-memory streaming workloads — if your use case is 100% real-time stream processing with no persistence needs, an in-memory engine like Memgraph may have lower latency for that specific pattern.
Cypher-only teams — while ArcadeDB supports Cypher (via the OpenCypher standard), teams deeply invested in Neo4j’s APOC library or GDS plugin will need to adapt some procedures.
Embedded analytics — if you need an embeddable, in-process analytical engine (like DuckDB but for graphs), KuzuDB’s architecture was purpose-built for that niche.

2. ArangoDB

License: BSL 1.1 (Community Edition); Proprietary (Enterprise Edition) Query Language: AQL (ArangoDB Query Language) — proprietary Data Models: Graph, Document, Key-Value Written in: C++ Persistence: Disk-based (RocksDB)

The Good

ArangoDB was one of the first multi-model databases, and it does document + graph + key-value well. The SmartGraphs feature (Enterprise only) enables efficient distributed graph traversals, and the Foxx microservices framework lets you run JavaScript inside the database.

The Problems

The license changed. ArangoDB moved from Apache 2.0 to Business Source License (BSL 1.1) starting with version 3.12 in 2024, limiting its Community Edition to 100GB. The Community Edition now has a 100GB dataset size limit and restricts commercial redistribution. If your data grows beyond 100GB, you either pay for Enterprise or you’re stuck. This is exactly the kind of bait-and-switch that erodes trust — years of community contributions under Apache 2.0, now locked behind a commercial license.

Proprietary query language. AQL is ArangoDB’s own language. It doesn’t follow SQL, Cypher, Gremlin, or any established standard. This creates vendor lock-in: your queries, your team’s knowledge, and your application logic are all tied to a language that only one database speaks. If you ever need to migrate, you’re rewriting everything.

No standard graph query support. ArangoDB doesn’t support Cypher, Gremlin, or GraphQL. For teams coming from Neo4j, this means a complete rewrite of all graph queries into AQL — which defeats the purpose of seeking a “Neo4j alternative.”

The company rebranded to arango.ai — signaling a pivot toward AI/ML positioning, though the core database technology hasn’t fundamentally changed.

Feature	ArcadeDB	ArangoDB Community
License	Apache 2.0	BSL 1.1
Data size limit	None	100GB
Cypher support	Yes	No
SQL support	Yes	No
Gremlin support	Yes	No
Vector search	Built-in	Plugin (Enterprise)
MCP server	Built-in	No

3. KuzuDB / LadybugDB

License: MIT (KuzuDB, archived); MIT (LadybugDB, community fork) Query Language: Cypher Data Model: Graph (property graph) Written in: C++ Persistence: Disk-based, columnar

What Happened

KuzuDB was a promising embedded graph database — think “DuckDB for graphs.” It was MIT-licensed, blazing fast for analytical queries, and backed by solid academic research from the University of Waterloo. KuzuDB was archived on GitHub in October 2025 following its acquisition by Apple. Active development stopped.

The community responded with forks, the most notable being LadybugDB. But community forks of abandoned projects face harsh realities: no funding, no core team continuity, and an uncertain roadmap.

Strengths (When It Was Active)

Excellent OLAP performance for analytical graph queries
Embeddable (in-process, no server required)
Clean Cypher implementation
MIT license (the fork preserves this)

Limitations

OLAP only. KuzuDB was designed for analytical workloads — batch processing, multi-hop aggregations, graph analytics. It was never built for OLTP transactional workloads: concurrent writes, real-time updates, or serving live application traffic. If your use case involves any of those, KuzuDB was never the right tool.

Abandoned. The original project is archived. LadybugDB and other forks are community efforts without corporate backing, commercial support, or guaranteed longevity. Building production infrastructure on an abandoned project fork is a risk most teams can’t justify.

No multi-model support. Graph only. No documents, no key-value, no vector search, no time-series. If your data doesn’t fit neatly into a property graph, you need a second database.

No server mode. KuzuDB was embedded — great for single-user analytics, problematic for multi-user applications, microservices, or any architecture that needs a shared database server.

If you’re currently on KuzuDB and looking to migrate, we wrote a detailed migration guide.

4. Memgraph

License: BSL 1.1 (source-available, NOT open-source by OSI definition) Query Language: Cypher Data Model: Graph (property graph) Written in: C++ Persistence: In-memory with optional snapshots (WAL)

The Good

Memgraph is genuinely fast for real-time graph workloads. Its in-memory architecture and C++ implementation deliver low-latency query execution, and the Cypher + Bolt protocol compatibility makes it a relatively smooth migration path from Neo4j. The MAGE library provides useful graph algorithms, and the streaming integrations (Kafka, Pulsar) are well-implemented.

NASA’s switch from Neo4j to Memgraph is a legitimate validation of its capabilities for specific high-performance use cases.

The Problems

It’s not open-source. Despite marketing itself as “open-source” on its website, GitHub README, and pricing pages, Memgraph uses the Business Source License 1.1. The BSL explicitly restricts commercial use — you cannot offer Memgraph as a service or build competing products. The Open Source Initiative does not recognize BSL as open-source. When a vendor calls BSL “open-source,” they’re either confused about licensing or deliberately misleading users. Independent analysis confirms neither the Community nor Enterprise edition qualifies as FOSS.

Expensive commercial licensing. Memgraph’s commercial pricing starts at approximately $25,000/year for 16GB of RAM. For comparison, you can run ArcadeDB on a 128GB server with terabytes of persistent storage for the cost of the hardware alone — because the software is free.

In-memory means in-expensive. An in-memory database requires all your data to fit in RAM. RAM is 30-50x more expensive per GB than SSD storage. For a 500GB dataset, you’re looking at server costs that dwarf any software license. Memgraph offers WAL and snapshots for durability, but recovery after a crash means reloading the entire dataset into memory — which can take significant time for large graphs.

Stability issues. Memgraph suffers from serious stability problems — the WCC benchmark crashed the server every time we ran it. Investigating further, we found 47 open issues on their GitHub reporting random crashes triggered even by simple queries, some of which have remained unaddressed for over 3 years.

Self-published benchmarks. Memgraph claims to be a faster alternative to Neo4j, but the LDBC Graphalytics benchmark tells a different story: not only is it significantly slower than Neo4j across the board, but Memgraph’s own claim of being “8x faster than Neo4j in reads and 50x faster in writes” is based on self-published benchmarks. Due to BSL restrictions, competing vendors cannot reproduce or independently verify them. Take vendor benchmarks — including ours — with appropriate skepticism.

Graph only. No document model, no key-value, no vector search, no time-series. If you need anything beyond a property graph, you need additional infrastructure.

Feature	ArcadeDB	Memgraph
License	Apache 2.0	BSL 1.1
OSI-approved open-source	Yes	No
Persistence	Native disk-based	In-memory (snapshots)
Storage cost (500GB data)	~$50/mo (SSD)	~$2,500/mo (RAM)
Multi-model	6 data models	Graph only
Query languages	5 (SQL, Cypher, Gremlin, GraphQL, MQL)	1 (Cypher)
Vector search	Built-in	No
MCP server	Built-in	No
Commercial license cost	$0	~$25,000/year

5. FalkorDB

License: Source-available (Server Side Public License / Redis-style) Query Language: Cypher (OpenCypher with extensions) Data Model: Graph (property graph) Written in: C (with GraphBLAS) Persistence: Disk-based (Redis-compatible)

The Good

FalkorDB has a genuinely unique architecture: it uses GraphBLAS — sparse matrix algebra with hardware-accelerated SIMD instructions — to execute graph operations. This is an innovative approach that delivers strong performance for specific query patterns, particularly subgraph matching and pattern recognition.

FalkorDB is also positioning aggressively for AI use cases with its GraphRAG SDK, which can automatically generate graph ontologies from unstructured data — a compelling feature for teams building knowledge graphs from documents.

The project emerged as a fork of RedisGraph after Redis Inc. discontinued it in 2023, and the team has done solid work extending it into a standalone product.

Limitations

Redis heritage. FalkorDB’s architecture still carries Redis DNA — the single-threaded execution model, the Redis serialization protocol, and the in-memory-first design philosophy. While the team has added persistence and is working on multi-threading, the architectural foundations have implications for write-heavy concurrent workloads.

Source-available, not open-source. Like Memgraph, FalkorDB uses a source-available license (SSPL-adjacent) that restricts how you can deploy and distribute the software commercially. It’s more permissive than BSL for self-hosted use, but it’s not Apache 2.0 or MIT.

Graph only (with vector search). FalkorDB added HNSW-based vector indexing in v4.0, supporting cosine and Euclidean similarity search on embeddings. However, it remains graph-only for other data models — no document storage, key-value, or time-series support without separate infrastructure.

Limited graph algorithm coverage. FalkorDB (a RedisGraph fork) has no built-in LCC or full SSSP algorithm. Its algo.SSpaths is pair-oriented, not a full single-source Dijkstra — which limits its usefulness for classic graph analytics workloads.

Ecosystem maturity. FalkorDB is relatively young as a standalone product (forked in 2023). The tooling, documentation, and third-party ecosystem are still developing. Enterprise features like fine-grained access control and audit logging are less mature than longer-established alternatives.

6. HugeGraph

License: Apache 2.0 Query Language: Gremlin, RESTful API Data Model: Graph (property graph) Written in: Java (server), Go (Vermeer OLAP engine) Persistence: Disk-based (pluggable backends: RocksDB, HBase, Cassandra, etc.)

The Good

HugeGraph is an Apache Software Foundation top-level project, which gives it genuine open-source governance — not a single-vendor project that can change its license overnight. It supports pluggable storage backends (RocksDB, HBase, Cassandra, MySQL, ScyllaDB) and can scale horizontally for very large graphs.

The Vermeer OLAP engine is a separate Go-based compute engine designed for graph analytics at scale. It loads data into memory and runs algorithms like PageRank, WCC, BFS, LCC, and Label Propagation (CDLP) via a REST API. This separation of OLTP (HugeGraph Server) and OLAP (Vermeer) is architecturally clean.

HugeGraph also benefits from strong adoption in China, particularly at Baidu (where it originated) and other major tech companies.

The Problems

Performance lags behind. In our LDBC Graphalytics benchmark, HugeGraph/Vermeer is significantly slower than ArcadeDB on every algorithm. Even comparing Docker-to-Docker (apples-to-apples): PageRank 4.8x slower, WCC 30x slower, BFS 7.7x slower, LCC 7.8x slower, and CDLP 18.7x slower. The LCC result (272s vs ArcadeDB Docker’s 35s) and CDLP (63s vs 3.4s) are particularly weak.

No weighted SSSP. Vermeer’s built-in SSSP algorithm only computes unweighted shortest paths (hop count). There is no weighted Dijkstra variant, which limits its usefulness for real-world graph analytics where edge weights represent distances, costs, or latencies.

No Cypher support. HugeGraph uses Gremlin and its own REST API — no Cypher, no SQL, no GraphQL. For teams migrating from Neo4j, this means a complete query rewrite. Gremlin is a recognized standard (Apache TinkerPop), but it’s verbose compared to Cypher for pattern matching queries.

Complex deployment. Running HugeGraph requires multiple components: HugeGraph Server for OLTP, Vermeer master + worker containers for OLAP, and a storage backend. The setup is significantly more complex than single-binary alternatives like ArcadeDB.

Documentation and ecosystem. Much of HugeGraph’s documentation and community discussion is in Chinese. English documentation exists but is less comprehensive. The third-party ecosystem (drivers, integrations, tutorials) is smaller than Neo4j, ArcadeDB, or Memgraph.

Feature	ArcadeDB	HugeGraph
License	Apache 2.0	Apache 2.0
Query languages	5 (SQL, Cypher, Gremlin, GraphQL, MQL)	2 (Gremlin, REST API)
Cypher support	Yes	No
SQL support	Yes	No
Data models	6	1 (Graph)
Vector search	Built-in	No
MCP server	Built-in	No
Weighted SSSP	Yes	No
Deployment	Single binary	Multiple components

The Comparison at a Glance

	ArcadeDB	ArangoDB	KuzuDB / LadybugDB	Memgraph	FalkorDB	HugeGraph
License	Apache 2.0	BSL 1.1	MIT (archived)	BSL 1.1	Source-available	Apache 2.0
OSI open-source	Yes	No	Yes (but abandoned)	No	No	Yes
Data models	6	3	1	1	1	1
Query languages	5	1 (proprietary)	1	1	1	2 (Gremlin, REST)
Cypher support	Yes	No	Yes	Yes	Yes	No
SQL support	Yes	No	No	No	No	No
Gremlin support	Yes	No	No	No	No	Yes
Persistence	Native disk	Native disk	Native disk	In-memory	Disk (Redis-based)	Disk (pluggable)
Vector search	Built-in	Enterprise only	No	No	Built-in (HNSW)	No
MCP server	Built-in	No	No	No	No	No
Data size limit	None	100GB (Community)	None	RAM-bound	RAM-bound	None
Commercial cost	$0	Enterprise pricing	N/A	~$25K/year	Enterprise pricing	$0
Active development	Yes	Yes	No (archived)	Yes	Yes	Yes

Why Licensing Matters More Than You Think

In 2024-2025, we watched multiple databases change their licenses after years of building communities on open-source promises:

MongoDB moved from AGPL to SSPL in 2018
Redis moved from BSD to dual-license (RSALv2 + SSPL) in 2024
ArangoDB moved from Apache 2.0 to BSL 1.1 in 2024
Elasticsearch moved from Apache 2.0 to SSPL in 2021

Every one of these changes was presented as “necessary for sustainability.” Every one of them restricted what users could do with software they’d invested in.

When you choose a database, you’re not just choosing today’s features — you’re betting on tomorrow’s licensing terms. A database under Apache 2.0 can never pull the rug out from under you. A database under BSL can change the conversion terms, extend the restriction period, or tighten the usage limitations at any time.

ArcadeDB is Apache 2.0 today, and we’ve made a public, permanent commitment to keep it that way. That’s not a marketing claim — it’s a structural guarantee.

So Which One Should You Choose?

Choose ArcadeDB if you want a multi-model database that handles graphs, documents, key-value, time-series, and vector data in one engine — with genuine open-source licensing, no data caps, and built-in AI/MCP integration.

Choose ArangoDB if you’re already invested in AQL and your dataset is under 100GB — but have a migration plan ready for when you hit that ceiling or the license terms change again.

Choose KuzuDB/LadybugDB if you need an embedded analytical graph engine for batch processing and you’re comfortable maintaining a dependency on an archived/forked project.

Choose Memgraph if you need the absolute lowest latency for in-memory real-time graph queries, your dataset fits in RAM, and you have the budget for commercial licensing.

Choose FalkorDB if you’re building AI/GraphRAG applications and want tight integration with LLM workflows, and the source-available license works for your deployment model.

Choose HugeGraph if you need an Apache-licensed graph database with pluggable storage backends for very large-scale deployments, your team is comfortable with Gremlin, and you don’t need Cypher compatibility or weighted shortest-path algorithms.

Key Takeaways

ArcadeDB is the only Neo4j alternative that combines six data models, five query languages, and a genuine Apache 2.0 license with no data caps or commercial restrictions.
ArangoDB changed its license from Apache 2.0 to BSL 1.1 in 2024 and capped its free tier at 100GB — a significant shift for existing users.
KuzuDB was acquired by Apple and archived in October 2025. Community forks exist but carry abandonment risk.
Memgraph offers fast in-memory graph queries but uses BSL 1.1 (not open source), costs ~$25,000/year commercially, and has documented stability issues.
FalkorDB brings innovative GraphBLAS architecture, HNSW vector search, and strong GraphRAG positioning, but uses a source-available license and lacks multi-model support beyond graph + vectors.
HugeGraph is an Apache-licensed project with pluggable storage backends and a separate Vermeer OLAP engine, but trails ArcadeDB Docker by 5-30x on graph algorithm performance (apples-to-apples Docker comparison), lacks Cypher support, and requires a complex multi-component deployment.
In 2026, only ArcadeDB, HugeGraph, and the archived KuzuDB/LadybugDB use OSI-approved open-source licenses among production graph databases.

Frequently Asked Questions

What is the best open-source alternative to Neo4j?

ArcadeDB is the most versatile open-source Neo4j alternative in 2026. It supports Cypher, SQL, Gremlin, GraphQL, and MQL under a permissive Apache 2.0 license — with no data caps, no commercial restrictions, and six data models in a single engine.

Is ArcadeDB compatible with Neo4j Cypher queries?

Yes. ArcadeDB includes a native OpenCypher engine that passes 97.8% of the official Cypher Technology Compatibility Kit (TCK). Most Cypher queries run as-is without rewriting. ArcadeDB also supports the BOLT protocol for Neo4j driver compatibility.

What happened to KuzuDB?

KuzuDB was acquired by Apple in October 2025, and its GitHub repository was archived. Active development stopped. Community forks like LadybugDB exist but lack corporate backing or guaranteed longevity.

Which graph databases are truly open source in 2026?

ArcadeDB (Apache 2.0), HugeGraph (Apache 2.0), and the archived KuzuDB/LadybugDB (MIT) are the only graph databases using OSI-approved open-source licenses. Memgraph and ArangoDB use Business Source License (BSL 1.1), while FalkorDB uses a source-available license. None of those are considered open source by the Open Source Initiative.

How do I migrate from Neo4j to ArcadeDB?

ArcadeDB provides a built-in Neo4j importer that reads Neo4j export files directly. Export your Neo4j database, run the importer, and your Cypher queries work as-is. See the full migration documentation.

What is the best graph database for AI and LLM agents in 2026?

ArcadeDB is the best graph database for AI integration in 2026. It includes a built-in MCP (Model Context Protocol) server for direct LLM-to-database communication and native vector search for embeddings — all under Apache 2.0 with no plugins or enterprise paywalls required.

Is Memgraph really open source?

No. Memgraph uses the Business Source License 1.1 (BSL), which restricts commercial use. The Open Source Initiative does not recognize BSL as an open-source license. Despite Memgraph’s marketing, neither its Community nor Enterprise edition qualifies as free and open-source software (FOSS). Independent analysis confirms this.

Getting Started with ArcadeDB

Ready to try the most versatile Neo4j alternative?

Docker (fastest way):

docker run --rm -it -p 2480:2480 -p 2424:2424 \
  -e JAVA_OPTS="-Darcadedb.server.rootPassword=playwithdata" \
  arcdata/arcadedb:latest

Then open http://localhost:2480 and start querying with SQL, Cypher, Gremlin, or GraphQL.

Have questions about migrating from Neo4j? Join our Discord — we’re happy to help.

Last updated: March 2026. This graph database comparison is reviewed and updated regularly to reflect licensing changes, new releases, and market developments.

ArcadeDB

Call Me Maybe, ArcadeDB? 34 Jepsen Tests, 34 Passes

Summary

What is Jepsen?

What we tested

Six workloads

Seven nemeses

The Results

The Faults, Visually

What Each Workload Actually Proves

bank: ACID under partitions

set: no acknowledged write is lost

elle: real transaction isolation, checked by cycles

register: leader-side linearizability

register-follower: linearizability when reads go to a follower

register-bookmark: read-your-writes via commit-index bookmarks

How read consistency works in ArcadeDB

Beyond Jepsen: the broader HA test suite

Reproduce it yourself

What we did not test

Help us break it

Further reading

ArcadeDB Academy: 6 Free Courses and Certification to Master the Multi-Model Database

The Problem with Database Training

What We Built

The Course I’m Most Excited About

For the Migrators

The Certification Means Something

Start Now

Anthropic Acquires ArcadeDB to Power “Bigfoot” - On Path to Super Intelligence

Why a Database?

The Road to Superintelligence

Industry Reactions

A Personal Note from Luca Garulli

ArcadeDB Grafana Plugin: BI Dashboards for Your Multi-Model Database

Why a Native Go Plugin Matters

Four Query Modes, One Plugin

Tutorial: Your First ArcadeDB Dashboard

Prerequisites

Step 1: Start ArcadeDB and Grafana with Docker

Step 2: Configure the Data Source

Step 3: Load the Demo Database

Step 4: SQL Panel - Bar Chart

Step 5: Cypher Panel - Table with Average Ratings

Step 6: Cypher Panel - Graph Visualization

Step 7: Compose Your Dashboard

Beyond the Basics

Template Variables

Alerting

ArcadeDB + BI: The Full Picture

Getting Started

GraphBatch: Up to 8x Faster Graph Ingestion in ArcadeDB

Why a New Importer?

The Benchmark

1M Vertices, 10M Edges — Light Edges (No Properties)

1M Vertices, 10M Edges — Edges with Properties (int + long)

Scaling Behavior

When to Use GraphBatch

Initial Import Scenarios

Runtime Scenarios

When NOT to Use It

Runtime Usage Examples

Bulk Friend Import (Light Edges)

IoT Sensor Linkage (with WAL for Crash Safety)

Knowledge Graph Entity Resolution (with Edge Properties)

Nightly Recommendation Rebuild

Incremental Sync from External Database

HTTP Batch Endpoint — GraphBatch for Every Language

JSONL Format

CSV Format

Temporary ID Mapping

Tuning via Query Parameters

Examples

gRPC Streaming API - GraphBatch with Backpressure

Why gRPC?

Message Structure

Python Example (grpcio)

Go Example

Java Example (generated stubs)

Get Started