Embedded Graph Database in Java

The Embedded Database Revolution

The database world is shifting. DuckDB proved that an in-process analytical database can outperform entire distributed clusters for real-world workloads. SQLite processes more queries per day than all other databases combined. Developers are realizing that the fastest network call is the one you never make.

Kuzu brought this revolution to graph databases — and was so successful that Apple acquired it in October 2025, archiving its open-source repository. That left a gap: where do you go for an embeddable, open-source graph database with an active community?

ArcadeDB has supported embedded mode since day one. But unlike Kuzu (graph-only) or DuckDB (analytics-only), ArcadeDB gives you the complete multi-model stack in-process: graphs, documents, key-value, vectors, time-series, and full-text search. One dependency, one JVM, five data models.

And it's not going anywhere. ArcadeDB is Apache 2.0 forever — no acquisition will archive your database.

Why Embedded Is Faster

Zero serialization: No marshalling objects to JSON/binary and back
Zero network: No TCP round-trips, no connection pooling, no timeouts
Direct memory: Access records via direct JVM memory references
No context switching: Database engine runs in your application threads
2M+ inserts/sec: On standard hardware, not exotic clusters
O(1) traversal: Graph hops via direct pointers, not index lookups

Client/Server vs. Embedded

Eliminate the Network, Keep the Power

In client/server mode, every database operation requires serializing your request, sending it over TCP/IP, deserializing on the server, executing, serializing the result, sending it back, and deserializing again. For a simple vertex lookup, that's 6 steps of overhead before you see your data.

In embedded mode, it's a direct Java method call. The ArcadeDB engine runs in the same JVM as your application. Your objects are the database's objects. There is no network, no serialization, no connection pool, no timeout configuration. Just direct, in-process access to a full-featured multi-model database.

This isn't a simplified "lite" mode. Every feature of ArcadeDB is available embedded:

Graph traversal with SQL, Cypher, and Gremlin
Vector search with JVector (DiskANN + HNSW)
Full-text search with fuzzy matching
Time-series with time_bucket(), rate(), moving_avg()
Document and key-value storage
ACID transactions with WAL recovery
Automatic crash recovery via Write-Ahead Logging

Up and Running in 60 Seconds

Add one Maven dependency. Create a database. Start querying. No server to install, no Docker container to manage, no ports to configure, no connection strings to debug.

ArcadeDB works with any JVM language: Java, Kotlin, Scala, Groovy, Clojure, or anything that runs on the Java Virtual Machine. The API is clean, fluent, and designed for modern Java patterns including try-with-resources and lambda transactions.

The database files live in a directory you specify. No background daemon. No configuration files. No ports. When your application starts, the database opens. When your application stops, the database closes. If the JVM crashes, WAL recovery handles the rest.

Maven Dependency

<dependency>
  <groupId>com.arcadedb</groupId>
  <artifactId>arcadedb-engine</artifactId>
  <version>26.2.1</version>
</dependency>

That's it. One dependency. ~15 MB. No native libraries, no JNI, pure Java.

Create Database & Graph

// Create or open a database
try (Database db = new DatabaseFactory(
    "/data/mydb").create()) {

  db.transaction(tx -> {
    // Define schema
    db.getSchema()
      .createVertexType("Person");
    db.getSchema()
      .createEdgeType("Knows");

    // Create vertices
    Vertex alice = db.newVertex("Person")
      .set("name", "Alice")
      .set("age", 32)
      .save();

    Vertex bob = db.newVertex("Person")
      .set("name", "Bob")
      .set("age", 28)
      .save();

    // Create edge
    alice.newEdge("Knows", bob, true)
      .set("since", 2020)
      .save();
  });
}

The Java API

ArcadeDB's Java API is designed around two principles: safety and simplicity. Databases implement AutoCloseable, so try-with-resources handles cleanup. Transactions are lambda-scoped, so you can't forget to commit or rollback.

The fluent builder pattern means creating vertices, edges, and documents reads like natural language. No ORMs, no annotation processing, no code generation — just direct, type-safe access to your data.

DatabaseFactory: Create or open databases by path
db.transaction(): Lambda-scoped ACID transactions with automatic commit/rollback
db.newVertex() / newDocument(): Fluent builders for all record types
db.query(): Execute SQL, Cypher, or Gremlin and iterate results
vertex.newEdge(): Create relationships directly from vertex references
Nested transactions: Supported for complex workflows

Every Query Language, In-Process

Embedded mode doesn't limit your query options. ArcadeDB supports SQL, Cypher (latest OpenCypher 25 grammar), Gremlin, GraphQL, and the MongoDB query language — all executing in-process without network overhead.

This matters because different problems suit different query languages. Use Cypher for graph pattern matching, SQL for aggregations and joins, Gremlin for imperative traversals, and the native Java API for maximum performance. Switch between them freely within the same application.

For Cypher specifically, ArcadeDB's native engine implements the latest OpenCypher 25 specification (Cypher 25 grammar) and passes 97.8% of the official Technology Compatibility Kit (TCK). If you're migrating from Neo4j or looking for a Kuzu replacement, your existing Cypher queries work as-is.

Query Languages in Embedded

// SQL
ResultSet rs = db.query("sql",
  "SELECT FROM Person WHERE age > ?",
  25);

// Cypher (OpenCypher)
ResultSet rs = db.query("cypher",
  "MATCH (p:Person)-[:Knows]->(f) "
  + "WHERE p.name = $name "
  + "RETURN f.name, f.age",
  "name", "Alice");

// Gremlin
ResultSet rs = db.query("gremlin",
  "g.V().hasLabel('Person')"
  + ".has('name','Alice')"
  + ".out('Knows').values('name')");

// Iterate results
while (rs.hasNext()) {
  Result row = rs.next();
  String name = row.getProperty("name");
}

Vector Index in Embedded Java

// Create vector index via Java API
new LSMVectorIndexBuilder(db,
    "Document",
    new String[]{"embedding"})
  .withDimensions(1536)
  .withSimilarity(
    VectorSimilarityFunction.COSINE)
  .create();

// Or via SQL
db.command("sql",
  "CREATE VECTOR INDEX ON Document"
  + "(embedding) LSM TYPE COSINE");

// Query: vector + graph in one call
ResultSet rs = db.query("sql",
  "SELECT content, "
  + "embedding.cosineDistance(?) "
  + "AS score, "
  + "out('MENTIONS').name AS entities "
  + "FROM Document "
  + "WHERE embedding.cosineDistance(?) "
  + "< 0.4 "
  + "ORDER BY score LIMIT 10",
  queryVector, queryVector);

JVector In-Process: AI Without Infrastructure

Building AI applications with RAG, semantic search, or recommendation engines? ArcadeDB's embedded mode runs JVector — the same vector search engine that powers DataStax Astra DB — directly inside your application.

No separate Pinecone subscription. No Weaviate container. No HTTP calls to a vector service. Your embeddings are indexed and searchable in the same JVM that generates them.

DiskANN + HNSW hybrid: Best-in-class search quality at any scale
SIMD acceleration: Leverages CPU vector instructions for similarity computation
Product Quantization: Handle billion-scale embeddings on a single machine
Graph + Vector in one query: Combine vector similarity with graph traversal — in-process, sub-millisecond

This is the ideal architecture for embedded AI: your LLM generates embeddings, your application stores and queries them via ArcadeDB, and graph context enriches the results — all in a single process with no external dependencies.

Embedded + HA: Best of Both Worlds

"But I need high availability" is the most common objection to embedded databases. ArcadeDB has a unique answer: the Embedded Server.

You can start an ArcadeDB Server instance inside your application. This gives you the performance of embedded mode (direct in-process access) while also exposing network protocols for replication and remote access. Your primary application gets zero-latency access. Replica nodes get real-time replication via the Raft consensus algorithm.

Primary node: Your application with embedded ArcadeDB gets direct in-process access at full speed
Replica nodes: Additional ArcadeDB servers replicate data in real-time via Raft consensus
Automatic failover: If the primary crashes, a replica is elected as the new leader
Remote access: The embedded server exposes HTTP, Postgres wire protocol, Redis, and MongoDB APIs for other clients
Same codebase: No separate "enterprise edition" — HA is built into the open-source Apache 2.0 release

Embedded Server + HA Replication

Start an Embedded Server

// Add the server dependency
// arcadedb-server (in addition to
// arcadedb-engine)

// Start embedded server
ArcadeDBServer server = new ArcadeDBServer();
server.start();

// Get direct in-process access
Database db = server.getDatabase("mydb");

// Use it exactly like pure embedded
db.transaction(tx -> {
  db.query("cypher",
    "MATCH (p:Person) RETURN p");
});

// Meanwhile, remote clients connect
// via HTTP, Postgres, Redis, or
// MongoDB wire protocols.
// HA replicas sync via Raft.

Your application gets direct in-process speed. External clients, replicas, and monitoring tools connect over the network. Best of both worlds.

The Embedded Server Pattern

The Embedded Server is ArcadeDB's unique architecture that no other embedded database offers. Here's how it works:

Your application starts ArcadeDB Server in-process — the server runs inside your JVM, not as a separate process
You access data directly via the Java API — zero network overhead, just like pure embedded mode
The server exposes network protocols — HTTP/JSON, Postgres wire protocol, Redis protocol, MongoDB protocol
Replicas connect over the network — Raft-based consensus keeps replicas synchronized in real-time
If your node fails, a replica takes over — automatic leader election ensures continuous availability

This means you can start with pure embedded mode during development and prototyping, then add HA replicas for production — without changing a single line of application code. The database API is identical.

Data Safety: WAL + MVCC

"What happens when the JVM crashes?" is the first question engineers ask about embedded databases. ArcadeDB's answer: the same thing that happens with any production database — Write-Ahead Logging (WAL) ensures durability.

Every mutation is written to a persistent journal before being applied to the data files. If the JVM crashes mid-transaction, ArcadeDB replays the WAL on the next startup, rolling back uncommitted transactions and completing committed ones. Your data integrity is guaranteed.

Write-Ahead Logging: All changes journaled before commit — crash recovery is automatic
MVCC: Multi-Version Concurrency Control for optimistic transactions without read locks
Isolation levels: READ_COMMITTED (default) or REPEATABLE_READ
Nested transactions: Support for complex, multi-step workflows
SIGTERM handling: Graceful shutdown automatically closes all open databases

Transaction Safety

// Transactions are lambda-scoped:
// auto-commit on success,
// auto-rollback on exception
db.transaction(tx -> {
  // All operations are ACID
  Vertex v = db.newVertex("Account")
    .set("balance", 1000.00)
    .save();

  // If this throws, everything
  // above is rolled back
  transferFunds(v, target, amount);
});
// If JVM crashes here, WAL
// ensures either both ops
// committed or neither did

// Nested transactions
db.transaction(tx -> {
  doPartOne(db);
  db.transaction(innerTx -> {
    doPartTwo(db);
    // Inner can rollback
    // without affecting outer
  });
});

When Embedded Mode Shines

AI / RAG Applications

Embed the vector store + knowledge graph directly in your AI application. No external database calls. Sub-millisecond retrieval for Graph RAG, semantic search, and recommendation engines.

IoT & Edge Computing

Run a full multi-model database on edge devices, gateways, and industrial controllers. Process sensor data with time-series, model device relationships as graphs, and sync to the cloud when connected.

Desktop & CLI Applications

Ship a rich data layer with your desktop or CLI tool. No installer, no service, no database administration. The database starts and stops with your application. Great for developer tools, IDEs, and data analysis apps.

Microservices with Local State

Each microservice gets its own embedded database for local state management. No shared database bottleneck. Event sourcing, CQRS, and saga patterns become natural with embedded graph + document + time-series.

Testing & CI/CD

No Testcontainers, no Docker, no database setup in CI. Create an in-memory or file-based database per test, run your assertions, tear it down. Tests are fast, isolated, and reproducible.

Commercial Software (OEM)

Embed ArcadeDB in your commercial product. Apache 2.0 means no license fees, no per-node pricing, no revenue sharing. Ship a full graph + vector + document database as part of your product.

Embedded Database Comparison

Feature	SQLite	DuckDB	Kuzu*	ArcadeDB
Graph model	✗	✗	✓	✓
Document model	JSON1	✗	✗	✓
Vector search	Extension	Extension	✓	✓
Time-series	✗	✗	✗	✓
Full-text search	FTS5	Extension	✓	✓
Cypher support	✗	✗	✓	✓
HA replication	LiteFS	✗	✗	✓ (Raft)
ACID transactions	✓	✓	✓	✓
License	Public Domain	MIT	MIT*	Apache 2.0

* Kuzu was acquired by Apple in October 2025. The open-source repository has been archived and the website taken down. Existing releases remain usable but no new development is planned.

The Only Embeddable Multi-Model Database

SQLite is the gold standard for embedded relational databases. DuckDB brought embedded analytics. Kuzu proved the concept for embedded graphs. But none of them offer a complete multi-model stack in a single embeddable library.

ArcadeDB is the only embeddable database that gives you:

5 data models in one library: Graph, document, key-value, vector, time-series
4 query languages: SQL, Cypher, Gremlin, GraphQL
Built-in HA: Raft replication for production deployments — no external tooling
Crash recovery: WAL-based recovery, not just in-memory
Production-proven: Same codebase powers both embedded and server deployments

And with Kuzu now archived, ArcadeDB is the only actively-maintained open-source option for an embeddable graph database with Cypher support.

Apache 2.0 — Forever

When you embed a database in your application, you depend on it completely. You need to trust that it won't be acquired and archived, that its license won't change, and that your production deployments won't be held hostage. ArcadeDB is Apache 2.0 forever. Embed it in commercial products, fork it if you want, deploy it anywhere. The license will never be used against you.

Embedded Graph Database