<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://arcadedb.com/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://arcadedb.com/" rel="alternate" type="text/html" /><updated>2026-06-03T22:48:31+00:00</updated><id>https://arcadedb.com/blog/feed.xml</id><title type="html">ArcadeDB</title><subtitle>The Next Generation Multi-Model Database</subtitle><entry><title type="html">ArcadeDB 26.6.1: TLS for HA Clusters, Durability Hardening &amp;amp; Security</title><link href="https://arcadedb.com/blog/arcadedb-26-6-1/" rel="alternate" type="text/html" title="ArcadeDB 26.6.1: TLS for HA Clusters, Durability Hardening &amp;amp; Security" /><published>2026-06-03T00:00:00+00:00</published><updated>2026-06-03T00:00:00+00:00</updated><id>https://arcadedb.com/blog/arcadedb-26-6-1</id><content type="html" xml:base="https://arcadedb.com/blog/arcadedb-26-6-1/"><![CDATA[<p>We’re pleased to announce <strong>ArcadeDB 26.6.1</strong>, a stability, durability, and security focused release with <strong>280+ commits</strong> resolving <strong>66 issues</strong>. Where <a href="https://arcadedb.com/blog/arcadedb-26-5-1/">26.5.1</a> was about new retrieval features, 26.6.1 is about making the engine harder to break: <strong>encrypted HA clusters</strong>, <strong>crash-safe durability</strong>, and a broad <strong>security hardening</strong> pass, on top of a long list of OpenCypher, SQL, vector, and wire-protocol fixes.</p>

<h2 id="major-highlights">Major Highlights</h2>

<h3 id="tlsssl-across-the-ha-cluster">TLS/SSL Across the HA Cluster</h3>

<p>The Raft-based High Availability cluster can now run fully encrypted. Inter-node replication traffic supports <strong>SSL/TLS</strong>, and the snapshot installer was fixed so a follower can download a leader snapshot over the <strong>HTTPS listener</strong> instead of failing with <code class="language-plaintext highlighter-rouge">Unsupported or unrecognized SSL message</code>. Encrypted clustering is now a first-class deployment option for regulated and zero-trust environments.</p>

<h3 id="durability--crash-recovery-hardening">Durability &amp; Crash-Recovery Hardening</h3>

<p>A large batch of fixes closes data-integrity gaps across the storage, WAL, and serialization layers, so committed transactions survive crashes and power loss, and recovery never silently drops data:</p>

<ul>
  <li>The <strong>WAL is fsynced on commit</strong> by default, and data files are fsynced before WAL files are deleted on a clean close.</li>
  <li><strong>Crash recovery aborts on a WAL version gap</strong> and preserves the WAL files instead of silently skipping it.</li>
  <li><code class="language-plaintext highlighter-rouge">MutablePage.move</code> no longer mis-tracks the modified range on backward shifts, so defragmentation bytes are never omitted from the WAL.</li>
  <li>Binary serialization now writes a property count that matches the bytes written, and handles partial reads via <code class="language-plaintext highlighter-rouge">readFully</code>.</li>
  <li>Short-write / short-read returns are respected in the paginated component file.</li>
  <li>LZ4 compression no longer corrupts data when the source buffer position is non-zero.</li>
  <li>The Simple-8b codec no longer silently truncates <code class="language-plaintext highlighter-rouge">Long.MAX_VALUE</code> / <code class="language-plaintext highlighter-rouge">Long.MIN_VALUE</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">migratedFileIds</code> is persisted in <code class="language-plaintext highlighter-rouge">schema.json</code>, so compaction no longer silently drops in-flight transactions across a restart.</li>
  <li>A <code class="language-plaintext highlighter-rouge">NegativeArraySizeException</code> on transaction commit was fixed.</li>
</ul>

<p>These are the kind of fixes you never see in a benchmark but feel in production: the database does what it promised on the unhappy path.</p>

<h3 id="security-hardening">Security Hardening</h3>

<ul>
  <li>All schema mutators now require the <strong><code class="language-plaintext highlighter-rouge">UPDATE_SCHEMA</code></strong> permission (previously only <code class="language-plaintext highlighter-rouge">createProperty</code> was gated).</li>
  <li><strong><code class="language-plaintext highlighter-rouge">IMPORT DATABASE</code></strong> now validates its source and requires admin privilege, closing SSRF and local-file-inclusion vectors.</li>
  <li>SQL injection in <code class="language-plaintext highlighter-rouge">RemoteVertex.newEdge</code> was fixed by switching to <strong>parameter binding</strong> (which also fixes breakage on apostrophes).</li>
  <li>JavaScript injection in the polyglot engine was closed by replacing a “looks-like-JSON” source-concatenation heuristic with a safe <code class="language-plaintext highlighter-rouge">Value.execute()</code> call.</li>
  <li>A full <strong>CodeQL cleanup</strong> resolved open Java and JavaScript code-scanning alerts at their true sources (workflow permissions, ReDoS, path-injection).</li>
</ul>

<h2 id="major-fixes">Major Fixes</h2>

<h3 id="high-availability--clustering">High Availability &amp; Clustering</h3>

<ul>
  <li><strong>TimeSeries data now replicates correctly</strong> across an HA cluster, and a compaction/append deadlock that caused a WAL version gap on Raft followers was eliminated.</li>
  <li>Concurrent single-row time-series <code class="language-plaintext highlighter-rouge">INSERT</code>s no longer silently lose samples.</li>
  <li><strong>Bolt writes to a follower</strong> no longer fail with “no authenticated user in the current security context”.</li>
  <li><code class="language-plaintext highlighter-rouge">PeerAddressAllowlistFilter</code> no longer rejects legitimate peers during a Kubernetes DNS-resolution race on startup or restart.</li>
  <li>New configurable paths for read-only and containerized deployments: <code class="language-plaintext highlighter-rouge">arcadedb.ha.raftStorageDirectory</code>, a configurable server log directory, and <code class="language-plaintext highlighter-rouge">arcadedb.ha.clusterTokenPath</code> to read the cluster shared secret from a file.</li>
  <li><code class="language-plaintext highlighter-rouge">RemoteDatabase</code> no longer reuses a session id across servers on HA failover during an open transaction; a clear <code class="language-plaintext highlighter-rouge">TransactionException</code> is raised on server switch instead.</li>
  <li>New <strong><code class="language-plaintext highlighter-rouge">STICKY</code></strong> strategy pins HTTP transactions to a concrete cluster member.</li>
  <li><code class="language-plaintext highlighter-rouge">/api/v1/server?mode=cluster</code> returns the <code class="language-plaintext highlighter-rouge">ha</code> section again after the Raft migration.</li>
  <li>New <strong>“Force Resync”</strong> button in Studio to recover a diverged follower from the leader.</li>
</ul>

<h3 id="opencypher">OpenCypher</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">CREATE INDEX</code> now <strong>implicitly creates the referenced property</strong> (Neo4j-style lazy schema).</li>
  <li><code class="language-plaintext highlighter-rouge">nodes()</code>, <code class="language-plaintext highlighter-rouge">relationships()</code>, and <code class="language-plaintext highlighter-rouge">length()</code> on variable-length path patterns (e.g. <code class="language-plaintext highlighter-rouge">[*1..3]</code>) are now implemented.</li>
  <li>Records written via SQL are now visible to subsequent Cypher queries (and vice versa) within the same transaction.</li>
  <li><code class="language-plaintext highlighter-rouge">EXPLAIN</code> no longer fails with an idempotency error on a multi-statement query containing <code class="language-plaintext highlighter-rouge">CREATE</code>.</li>
  <li>Label disjunction <code class="language-plaintext highlighter-rouge">(n:A|B)</code> no longer returns zero rows.</li>
  <li><code class="language-plaintext highlighter-rouge">allShortestPaths()</code> returns all co-shortest paths instead of just one.</li>
  <li><code class="language-plaintext highlighter-rouge">MERGE</code> uses a bound anchor as the traversal start instead of a full edge-type scan, and no longer crashes on single-quote property values or rebinds variables from an <code class="language-plaintext highlighter-rouge">OPTIONAL MATCH</code> null endpoint.</li>
  <li><code class="language-plaintext highlighter-rouge">DATETIME</code> comparison with <code class="language-plaintext highlighter-rouge">datetime()</code> no longer returns zero rows, and results are now consistent between parameterized and hard-coded values.</li>
</ul>

<h3 id="sql">SQL</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">IN :param</code> with a collection parameter now returns rows when an index is used.</li>
  <li><code class="language-plaintext highlighter-rouge">MOVE VERTEX</code> no longer generates an internal error.</li>
  <li><code class="language-plaintext highlighter-rouge">expand()</code> projection honors its <code class="language-plaintext highlighter-rouge">AS</code> alias instead of always being named <code class="language-plaintext highlighter-rouge">value</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">IN (SELECT …)</code> no longer always returns empty.</li>
  <li><code class="language-plaintext highlighter-rouge">MERGE</code> on a UNIQUE-indexed property no longer throws on a duplicate key when the same key appears twice in a batch (matching Neo4j semantics).</li>
  <li><code class="language-plaintext highlighter-rouge">node.*</code> and <code class="language-plaintext highlighter-rouge">rel.*</code> functions no longer silently return null from SQL.</li>
  <li>TimeSeries timestamps are now returned in queries.</li>
  <li>New <code class="language-plaintext highlighter-rouge">cypherRID()</code> SQL function and <code class="language-plaintext highlighter-rouge">asCypherRID()</code> method for interoperating with Cypher numeric ids.</li>
</ul>

<h3 id="vector--index">Vector &amp; Index</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">TRUNCATE TYPE</code> no longer resets an <code class="language-plaintext highlighter-rouge">LSM_VECTOR</code> index dimension to 0, nor leaves UNIQUE indexes in an inconsistent state.</li>
  <li><code class="language-plaintext highlighter-rouge">LSMVectorIndex</code> now converts JVector’s EUCLIDEAN return to L2² distance in all search paths, so K-NN no longer returns the worst matches first.</li>
  <li><code class="language-plaintext highlighter-rouge">REBUILD INDEX</code> now works for <code class="language-plaintext highlighter-rouge">BY ITEM</code> indexes.</li>
  <li><code class="language-plaintext highlighter-rouge">vector.fuse()</code> is now recognized as a SQL function.</li>
</ul>

<h3 id="wire-protocols">Wire Protocols</h3>

<ul>
  <li><strong>Bolt:</strong> parameterized Cypher <code class="language-plaintext highlighter-rouge">MATCH</code> queries via the JavaScript <code class="language-plaintext highlighter-rouge">neo4j-driver</code> now work; integer property values are no longer coerced to strings after <code class="language-plaintext highlighter-rouge">CREATE INDEX</code>.</li>
  <li><strong>PostgreSQL:</strong> scalar columns are advertised with native OIDs.</li>
  <li><strong>gRPC:</strong> correct exceptions (<code class="language-plaintext highlighter-rouge">NOT_FOUND</code> for missing records), proper <code class="language-plaintext highlighter-rouge">LocalDateTime</code> / <code class="language-plaintext highlighter-rouge">LocalDate</code> handling, and <code class="language-plaintext highlighter-rouge">InsertStream</code> no longer rolls back a whole stream on a commit-time duplicate with <code class="language-plaintext highlighter-rouge">CONFLICT_IGNORE</code>.</li>
  <li><strong>HTTP:</strong> <code class="language-plaintext highlighter-rouge">DuplicatedKeyException</code> now returns <code class="language-plaintext highlighter-rouge">409 Conflict</code> instead of <code class="language-plaintext highlighter-rouge">503 Service Unavailable</code>.</li>
</ul>

<h3 id="studio--operations">Studio &amp; Operations</h3>

<ul>
  <li>Optional <strong>production-mode Studio</strong>, enabled by a global setting on request.</li>
  <li>New show/hide toggle for the Appearance section in the graph side panel.</li>
  <li>AI assistant flow, database selection, and layout improvements; query profiler “Analyze with AI”; refreshed server and profiler metrics.</li>
  <li>New offline build mode for the distribution builder.</li>
</ul>

<h3 id="dependencies">Dependencies</h3>

<p>Notable upgrades include Netty 4.2.14.Final, Undertow 2.4.1.Final, Protobuf 4.35.0, JLine 4.1.3, JUnit Jupiter 6.1.0, Jackson Databind 2.21.4, Apache Commons Configuration 2.15.1, Swagger 2.2.50, SLF4J 2.0.18, and Logback 1.5.33, plus the usual round of Studio frontend, e2e harness, and CI updates.</p>

<h2 id="getting-started-with-2661">Getting Started with 26.6.1</h2>

<h3 id="docker">Docker</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker pull arcadedata/arcadedb:26.6.1
</code></pre></div></div>

<p>Visit our <a href="https://hub.docker.com/r/arcadedata/arcadedb">Docker Hub repository</a> for more information.</p>

<h3 id="maven">Maven</h3>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;dependency&gt;</span>
    <span class="nt">&lt;groupId&gt;</span>com.arcadedb<span class="nt">&lt;/groupId&gt;</span>
    <span class="nt">&lt;artifactId&gt;</span>arcadedb-engine<span class="nt">&lt;/artifactId&gt;</span>
    <span class="nt">&lt;version&gt;</span>26.6.1<span class="nt">&lt;/version&gt;</span>
<span class="nt">&lt;/dependency&gt;</span>
</code></pre></div></div>

<p>All artifacts are available on <a href="https://repo.maven.apache.org/maven2/com/arcadedb/">Maven Central</a>.</p>

<h3 id="documentation">Documentation</h3>

<p>For detailed information on features and usage, refer to our <a href="https://docs.arcadedb.com/">comprehensive documentation</a>.</p>

<h2 id="compatibility-note">Compatibility Note</h2>

<p>This release maintains 100% compatibility with previous database formats, meaning no export/import is required when upgrading. As always, we recommend creating a database backup before upgrading.</p>

<hr />

<p><strong>Download ArcadeDB 26.6.1 now</strong>: <a href="https://github.com/ArcadeData/arcadedb/releases/tag/26.6.1">GitHub Releases</a></p>

<p>Thanks to everyone in the community who reported issues, opened PRs, and helped shape this release.</p>

<p>Luca Garulli
ArcadeDB Founder</p>]]></content><author><name>Luca Garulli</name></author><category term="Multi-Model" /><category term="High Availability" /><category term="Security" /><category term="Graph Database" /><category term="Release" /><summary type="html"><![CDATA[ArcadeDB 26.6.1 brings end-to-end TLS/SSL for HA clusters, a deep durability and crash-recovery hardening pass across the WAL and storage layers, a broad security hardening sweep, and a long list of OpenCypher, SQL, vector and wire-protocol fixes.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/release-v26.6.1.jpg" /><media:content medium="image" url="https://arcadedb.com/assets/images/release-v26.6.1.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Deploy an ArcadeDB Cluster on Kubernetes with the Official Helm Chart</title><link href="https://arcadedb.com/blog/deploy-arcadedb-cluster-kubernetes-helm/" rel="alternate" type="text/html" title="Deploy an ArcadeDB Cluster on Kubernetes with the Official Helm Chart" /><published>2026-05-13T00:00:00+00:00</published><updated>2026-05-13T00:00:00+00:00</updated><id>https://arcadedb.com/blog/deploy-arcadedb-cluster-kubernetes-helm</id><content type="html" xml:base="https://arcadedb.com/blog/deploy-arcadedb-cluster-kubernetes-helm/"><![CDATA[<p>Running <a href="https://arcadedb.com/">ArcadeDB</a> as a single container is easy. Running it as a replicated service on a Kubernetes cluster used to mean writing a fair amount of YAML and reading the HA docs twice. With the official <a href="https://github.com/ArcadeData/arcadedb-helm">arcadedb-helm</a> chart, it now takes one command.</p>

<p>In this post I walk through the chart, show how to bring up a three-node HA cluster, and point at the companion <a href="https://github.com/ArcadeData/arcadedb-deployments">arcadedb-deployments</a> repository if you want a runnable local example before touching your production cluster.</p>

<h2 id="why-run-arcadedb-on-kubernetes">Why Run ArcadeDB on Kubernetes</h2>

<p>ArcadeDB is built around an embedded engine that scales vertically very well. What you get from Kubernetes is the operational layer: rolling upgrades, persistent volumes, automatic restarts when a node dies, horizontal scale for read-heavy workloads, and replication across availability zones.</p>

<p>The Helm chart wraps that into a StatefulSet with stable network identities, a headless service for peer discovery, and probes wired to the <a href="https://docs.arcadedb.com/"><code class="language-plaintext highlighter-rouge">/api/v1/ready</code></a> endpoint. When <code class="language-plaintext highlighter-rouge">replicaCount</code> is greater than 1, the chart turns on <a href="https://docs.arcadedb.com/arcadedb/concepts/ha-cluster.html">Raft consensus</a> across the pods. No extra flags, no manual peer lists.</p>

<h2 id="what-the-helm-chart-gives-you">What the Helm Chart Gives You</h2>

<p>The chart lives under <a href="https://github.com/ArcadeData/arcadedb-helm/tree/main/charts/arcadedb"><code class="language-plaintext highlighter-rouge">charts/arcadedb</code></a> and is published on <a href="https://artifacthub.io/packages/helm/arcadedb/arcadedb">Artifact Hub</a>. The current chart version is <code class="language-plaintext highlighter-rouge">26.4.2</code>, the same as the ArcadeDB engine version it deploys.</p>

<p>The defaults are sensible. You get a StatefulSet with stable pod names (<code class="language-plaintext highlighter-rouge">arcadedb-0</code>, <code class="language-plaintext highlighter-rouge">arcadedb-1</code>, …) and ordered rollout, a headless service so each pod resolves its peers via DNS (<code class="language-plaintext highlighter-rouge">arcadedb-0.arcadedb.default.svc.cluster.local</code>), and a PersistentVolumeClaim template (8Gi ReadWriteOnce by default) mounted at <code class="language-plaintext highlighter-rouge">/home/arcadedb/databases</code>. Liveness and readiness probes hit <code class="language-plaintext highlighter-rouge">/api/v1/ready</code>.</p>

<p>Security is also taken care of: the pod runs as non-root UID/GID 1000, all Linux capabilities are dropped, privilege escalation is disabled, and the ServiceAccount token is unmounted because the database does not call the Kubernetes API. A <code class="language-plaintext highlighter-rouge">NetworkPolicy</code> can lock the Raft gRPC port down to ArcadeDB pods only, and there is <code class="language-plaintext highlighter-rouge">HorizontalPodAutoscaler</code> support that pre-sizes the Raft peer list to <code class="language-plaintext highlighter-rouge">maxReplicas</code> so scale-out joins are clean.</p>

<p>The whole chart is small enough to read in a single sitting, which I recommend before you push it to production.</p>

<h2 id="prerequisites">Prerequisites</h2>

<p>You need a Kubernetes cluster (1.27 or newer is fine), Helm 3.16 or newer, <code class="language-plaintext highlighter-rouge">kubectl</code> pointed at the target cluster, and a storage class that supports <code class="language-plaintext highlighter-rouge">ReadWriteOnce</code>. The defaults on EKS, GKE, AKS, and DigitalOcean all work. For local experimentation, <a href="https://kind.sigs.k8s.io/">kind</a> 0.24 or newer is enough.</p>

<h2 id="the-30-second-install">The 30-Second Install</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm repo add arcadedb https://helm.arcadedb.com/
helm repo update
helm <span class="nb">install </span>my-arcadedb arcadedb/arcadedb
</code></pre></div></div>

<p>That is it. You now have a single-pod ArcadeDB with a persistent volume and a ClusterIP service.</p>

<p>Port-forward to reach <a href="https://docs.arcadedb.com/arcadedb/tools/studio.html">Studio</a>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl port-forward svc/my-arcadedb 2480:2480
</code></pre></div></div>

<p>Open <code class="language-plaintext highlighter-rouge">http://localhost:2480</code> in your browser. Done.</p>

<p>For a dev box, a CI fixture, or a smoke test, this is enough. Anything user-facing needs more.</p>

<h2 id="production-values-a-three-node-ha-cluster">Production Values: a Three-Node HA Cluster</h2>

<p>For the multi-node setup, drop the following into a <code class="language-plaintext highlighter-rouge">values.yaml</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">replicaCount</span><span class="pi">:</span> <span class="m">3</span>

<span class="na">image</span><span class="pi">:</span>
  <span class="na">repository</span><span class="pi">:</span> <span class="s">arcadedata/arcadedb</span>
  <span class="na">tag</span><span class="pi">:</span> <span class="s2">"</span><span class="s">26.4.2"</span>
  <span class="na">pullPolicy</span><span class="pi">:</span> <span class="s">IfNotPresent</span>

<span class="na">arcadedb</span><span class="pi">:</span>
  <span class="na">rootPassword</span><span class="pi">:</span>
    <span class="na">secret</span><span class="pi">:</span>
      <span class="na">name</span><span class="pi">:</span> <span class="s">arcadedb-credentials</span>
      <span class="na">key</span><span class="pi">:</span> <span class="s">rootPassword</span>

<span class="na">persistence</span><span class="pi">:</span>
  <span class="na">enabled</span><span class="pi">:</span> <span class="kc">true</span>
  <span class="na">size</span><span class="pi">:</span> <span class="s">50Gi</span>
  <span class="na">storageClass</span><span class="pi">:</span> <span class="s2">"</span><span class="s">fast-ssd"</span>

<span class="na">resources</span><span class="pi">:</span>
  <span class="na">requests</span><span class="pi">:</span>
    <span class="na">cpu</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1"</span>
    <span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">4Gi"</span>
  <span class="na">limits</span><span class="pi">:</span>
    <span class="na">cpu</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2"</span>
    <span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">8Gi"</span>

<span class="na">service</span><span class="pi">:</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">ClusterIP</span>

<span class="na">ingress</span><span class="pi">:</span>
  <span class="na">enabled</span><span class="pi">:</span> <span class="kc">true</span>
  <span class="na">className</span><span class="pi">:</span> <span class="s2">"</span><span class="s">nginx"</span>
  <span class="na">hosts</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">host</span><span class="pi">:</span> <span class="s">arcadedb.example.com</span>
      <span class="na">paths</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">path</span><span class="pi">:</span> <span class="s">/</span>
          <span class="na">pathType</span><span class="pi">:</span> <span class="s">Prefix</span>
  <span class="na">tls</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">secretName</span><span class="pi">:</span> <span class="s">arcadedb-tls</span>
      <span class="na">hosts</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="s">arcadedb.example.com</span>

<span class="na">networkPolicy</span><span class="pi">:</span>
  <span class="na">enabled</span><span class="pi">:</span> <span class="kc">true</span>
</code></pre></div></div>

<p>Create the credentials secret separately, so the password never lives in your Helm values or your Git history:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl create secret generic arcadedb-credentials <span class="se">\</span>
  <span class="nt">--from-literal</span><span class="o">=</span><span class="nv">rootPassword</span><span class="o">=</span><span class="s1">'choose-something-strong'</span>
</code></pre></div></div>

<p>Then install (or upgrade) the chart:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm upgrade <span class="nt">--install</span> arcadedb arcadedb/arcadedb <span class="se">\</span>
  <span class="nt">--namespace</span> arcadedb <span class="nt">--create-namespace</span> <span class="se">\</span>
  <span class="nt">-f</span> values.yaml <span class="nt">--wait</span> <span class="nt">--timeout</span> 10m
</code></pre></div></div>

<p>With <code class="language-plaintext highlighter-rouge">replicaCount: 3</code>, the chart wires the StatefulSet for <a href="https://docs.arcadedb.com/arcadedb/concepts/ha-cluster.html">Raft HA</a>. Each pod gets its own PVC, joins the cluster through the headless service, and the three-node quorum elects a leader.</p>

<h2 id="verifying-the-cluster">Verifying the Cluster</h2>

<p>Watch the pods come up:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl <span class="nt">-n</span> arcadedb get pods <span class="nt">-w</span>
</code></pre></div></div>

<p>You should see <code class="language-plaintext highlighter-rouge">arcadedb-0</code>, <code class="language-plaintext highlighter-rouge">arcadedb-1</code>, and <code class="language-plaintext highlighter-rouge">arcadedb-2</code> reach <code class="language-plaintext highlighter-rouge">Running</code> in order. Once all three are ready, ask the cluster who is in charge:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl <span class="nt">-n</span> arcadedb port-forward svc/arcadedb 2480:2480 &amp;
curl <span class="nt">-u</span> root:choose-something-strong http://localhost:2480/api/v1/server | jq .ha
</code></pre></div></div>

<p>The response includes the current leader, the list of replicas, and the network status of each peer. If you see three online servers and one of them flagged as <code class="language-plaintext highlighter-rouge">leader</code>, you have a working HA cluster.</p>

<p>To prove the failover works, delete the leader pod and watch the cluster re-elect:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl <span class="nt">-n</span> arcadedb delete pod arcadedb-0
kubectl <span class="nt">-n</span> arcadedb get pods <span class="nt">-w</span>
</code></pre></div></div>

<p>The remaining nodes hold quorum, a new leader is elected within seconds, and Kubernetes brings the missing pod back. Its PVC is reattached, the data is intact, and it rejoins the Raft group as a follower.</p>

<h2 id="try-it-locally-first-the-arcadedb-deployments-repo">Try It Locally First: the arcadedb-deployments Repo</h2>

<p>Before opening a PR against your platform team’s repo, run the thing end-to-end on your laptop. The <a href="https://github.com/ArcadeData/arcadedb-deployments">arcadedb-deployments</a> repository has a ready-to-run example under <code class="language-plaintext highlighter-rouge">kubernetes/</code>.</p>

<p>The <code class="language-plaintext highlighter-rouge">start.sh</code> script creates a <a href="https://kind.sigs.k8s.io/">kind</a> cluster named <code class="language-plaintext highlighter-rouge">arcadedb</code>, runs <code class="language-plaintext highlighter-rouge">helm dependency update</code>, installs the chart with <code class="language-plaintext highlighter-rouge">--wait</code>, applies a 3-replica <code class="language-plaintext highlighter-rouge">values.yaml</code> and the credentials secret, waits for <code class="language-plaintext highlighter-rouge">/api/v1/ready</code> to respond on every pod, and sets up a background <code class="language-plaintext highlighter-rouge">kubectl port-forward</code> to <code class="language-plaintext highlighter-rouge">http://localhost:2480</code>. <code class="language-plaintext highlighter-rouge">test.sh</code> then drives an end-to-end smoke test against the cluster.</p>

<p>Clone, run, done:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/ArcadeData/arcadedb-deployments.git
<span class="nb">cd </span>arcadedb-deployments/kubernetes
./start.sh
./test.sh
</code></pre></div></div>

<p>When you are finished:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./stop.sh
</code></pre></div></div>

<p>It is the fastest way to convince yourself (or your team) that the chart behaves the way you expect. Same chart, same values shape, same probes, smaller cluster.</p>

<p>The same repository ships an <code class="language-plaintext highlighter-rouge">ha-cluster/</code> scenario built on Docker Compose if you want to compare the same topology without Kubernetes in the picture.</p>

<h2 id="operating-the-cluster">Operating the Cluster</h2>

<p>A few practical notes for day-two operations.</p>

<h3 id="upgrades">Upgrades</h3>

<p>Bump the chart and image tag together, then <code class="language-plaintext highlighter-rouge">helm upgrade</code>. The StatefulSet rolls pods one at a time, the readiness probe gates each step, and Raft tolerates the missing follower throughout. Always upgrade in a non-production environment first to validate the engine version.</p>

<h3 id="scaling">Scaling</h3>

<p>To scale out, increase <code class="language-plaintext highlighter-rouge">replicaCount</code> and run <code class="language-plaintext highlighter-rouge">helm upgrade</code>. New pods come up, join the Raft group as followers, and start serving reads.</p>

<p>Scale-down needs more care. Never drop below the quorum size of your current cluster, and always remove pods one at a time. Three or five nodes covers most workloads. Seven is the upper end before the Raft commit cost outweighs the redundancy you get back.</p>

<h3 id="backups">Backups</h3>

<p>ArcadeDB has built-in <a href="https://arcadedb.com/blog/introducing-automatic-database-backups-in-arcadedb/">automatic database backups</a>. On Kubernetes, point the backup directory at a separate volume (or a CSI driver that snapshots to object storage) so backup data lives outside the database PVC. Take the snapshot at the leader to get a consistent view.</p>

<h3 id="observability">Observability</h3>

<p>The chart exposes the standard ArcadeDB metrics on the HTTP port. Scrape them with your existing Prometheus stack and alert on Raft leader changes, replication lag, and PVC capacity.</p>

<h3 id="security">Security</h3>

<p>Change the default <code class="language-plaintext highlighter-rouge">root</code> password. Always. Use a <code class="language-plaintext highlighter-rouge">Secret</code>, never <code class="language-plaintext highlighter-rouge">--set</code> it on the command line. Enable the included <code class="language-plaintext highlighter-rouge">NetworkPolicy</code> to keep the Raft port internal to the namespace. If you expose Studio publicly, put it behind your usual ingress, OIDC proxy, or VPN.</p>

<h2 id="where-to-go-next">Where to Go Next</h2>

<ul>
  <li><a href="https://github.com/ArcadeData/arcadedb-helm">arcadedb-helm</a>: chart source, values reference, and CI tests</li>
  <li><a href="https://github.com/ArcadeData/arcadedb-deployments">arcadedb-deployments</a>: runnable Kubernetes and Docker Compose examples</li>
  <li><a href="https://docs.arcadedb.com/arcadedb/concepts/ha-cluster.html">ArcadeDB HA Cluster docs</a>: how Raft replication works under the hood</li>
  <li><a href="https://arcadedb.com/academy.html">ArcadeDB Academy</a>: free courses, including hands-on labs</li>
</ul>

<p>If something does not work the way this post describes, open an issue on the chart repo. PRs are welcome too. The chart is actively maintained, the CI pipeline lints every change, and the <code class="language-plaintext highlighter-rouge">helm-unittest</code> suite already covers most templates.</p>]]></content><author><name>Roberto Franchini</name></author><category term="Kubernetes" /><category term="Helm" /><category term="HA Cluster" /><category term="Raft" /><category term="DevOps" /><category term="Deployment" /><category term="StatefulSet" /><summary type="html"><![CDATA[Step-by-step guide to deploying a high-availability ArcadeDB cluster on Kubernetes using the official Helm chart. Includes a kind-based local example, production values, Raft consensus, persistence, and verification.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/arcadedb-helm-kubernetes.svg" /><media:content medium="image" url="https://arcadedb.com/assets/images/arcadedb-helm-kubernetes.svg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">ArcadeDB 26.5.1: Sparse Vector Index, Hybrid Retrieval &amp;amp; INT8 End-to-End</title><link href="https://arcadedb.com/blog/arcadedb-26-5-1/" rel="alternate" type="text/html" title="ArcadeDB 26.5.1: Sparse Vector Index, Hybrid Retrieval &amp;amp; INT8 End-to-End" /><published>2026-05-11T00:00:00+00:00</published><updated>2026-05-11T00:00:00+00:00</updated><id>https://arcadedb.com/blog/arcadedb-26-5-1</id><content type="html" xml:base="https://arcadedb.com/blog/arcadedb-26-5-1/"><![CDATA[<p>We’re excited to announce <strong>ArcadeDB 26.5.1</strong>, a major release with <strong>270+ commits</strong> resolving <strong>128 issues</strong>. The headline feature is a brand-new <strong>sparse vector index</strong> with <strong>server-side hybrid retrieval</strong> and <strong>INT8 quantization end-to-end</strong>, alongside extensive <strong>OpenCypher correctness</strong> improvements and <strong>query partitioning</strong>.</p>

<h2 id="major-new-features">Major New Features</h2>

<h3 id="sparse-vector-index--hybrid-retrieval">Sparse Vector Index &amp; Hybrid Retrieval</h3>

<p>The new <code class="language-plaintext highlighter-rouge">LSM_SPARSE_VECTOR</code> index type enables sparse-embedding retrieval (BM25/SPLADE-style) directly inside ArcadeDB.</p>

<figure style="margin: 32px 0; text-align: center;">
  <svg viewBox="0 0 760 360" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Dense vector vs sparse vector representation" style="max-width: 100%; height: auto; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;">
    <!-- Dense Vector Panel -->
    <g>
      <rect x="20" y="20" width="340" height="320" rx="12" fill="#F0F4F8" stroke="#D1D9E0" stroke-width="1" />
      <text x="190" y="50" text-anchor="middle" font-size="18" font-weight="700" fill="#1F2937">Dense Vector</text>
      <text x="190" y="70" text-anchor="middle" font-size="11" fill="#6B7280">float32 embedding (semantic)</text>

      <g transform="translate(50, 110)">
        <rect x="0" y="0" width="20" height="44" fill="#0066CC" opacity="0.35" />
        <rect x="20" y="0" width="20" height="44" fill="#0066CC" opacity="0.78" />
        <rect x="40" y="0" width="20" height="44" fill="#0066CC" opacity="0.21" />
        <rect x="60" y="0" width="20" height="44" fill="#0066CC" opacity="0.55" />
        <rect x="80" y="0" width="20" height="44" fill="#0066CC" opacity="0.89" />
        <rect x="100" y="0" width="20" height="44" fill="#0066CC" opacity="0.42" />
        <rect x="120" y="0" width="20" height="44" fill="#0066CC" opacity="0.67" />
        <rect x="140" y="0" width="20" height="44" fill="#0066CC" opacity="0.15" />
        <rect x="160" y="0" width="20" height="44" fill="#0066CC" opacity="0.72" />
        <rect x="180" y="0" width="20" height="44" fill="#0066CC" opacity="0.48" />
        <rect x="200" y="0" width="20" height="44" fill="#0066CC" opacity="0.91" />
        <rect x="220" y="0" width="20" height="44" fill="#0066CC" opacity="0.28" />
        <rect x="240" y="0" width="20" height="44" fill="#0066CC" opacity="0.61" />
        <rect x="260" y="0" width="20" height="44" fill="#0066CC" opacity="0.34" />
      </g>

      <text x="190" y="190" text-anchor="middle" font-size="11" font-family="ui-monospace, SFMono-Regular, Menlo, Consolas, monospace" fill="#4B5563">[0.35, 0.78, 0.21, 0.55, 0.89, 0.42, ...]</text>

      <text x="190" y="235" text-anchor="middle" font-size="13" fill="#1F2937"><tspan font-weight="600">384 &#8211; 1,536</tspan> dimensions</text>
      <text x="190" y="260" text-anchor="middle" font-size="13" fill="#1F2937">Every position has a value</text>
      <text x="190" y="305" text-anchor="middle" font-size="13" font-weight="600" fill="#0066CC">Semantic similarity</text>
    </g>

    <!-- Sparse Vector Panel -->
    <g>
      <rect x="400" y="20" width="340" height="320" rx="12" fill="#FFF7ED" stroke="#FED7AA" stroke-width="1" />
      <text x="570" y="50" text-anchor="middle" font-size="18" font-weight="700" fill="#1F2937">Sparse Vector</text>
      <text x="570" y="70" text-anchor="middle" font-size="11" fill="#D97706" font-weight="600">NEW &middot; BM25 / SPLADE-style</text>

      <g transform="translate(430, 110)">
        <rect x="0" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="10" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="20" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="30" y="0" width="10" height="44" fill="#F59E0B" opacity="0.85" />
        <rect x="40" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="50" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="60" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="70" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="80" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="90" y="0" width="10" height="44" fill="#F59E0B" opacity="0.60" />
        <rect x="100" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="110" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="120" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="130" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="140" y="0" width="10" height="44" fill="#F59E0B" opacity="0.95" />
        <rect x="150" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="160" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="170" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="180" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="190" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="200" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="210" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="220" y="0" width="10" height="44" fill="#F59E0B" opacity="0.75" />
        <rect x="230" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="240" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="250" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
        <rect x="260" y="0" width="10" height="44" fill="#F59E0B" opacity="0.50" />
        <rect x="270" y="0" width="10" height="44" fill="#FFFFFF" stroke="#FED7AA" stroke-width="0.5" />
      </g>

      <text x="570" y="190" text-anchor="middle" font-size="11" font-family="ui-monospace, SFMono-Regular, Menlo, Consolas, monospace" fill="#4B5563">{ 3: 0.85, 9: 0.60, 14: 0.95, 22: 0.75, ... }</text>

      <text x="570" y="235" text-anchor="middle" font-size="13" fill="#1F2937"><tspan font-weight="600">30,000+</tspan> vocabulary positions</text>
      <text x="570" y="260" text-anchor="middle" font-size="13" fill="#1F2937">Only a few non-zero values</text>
      <text x="570" y="305" text-anchor="middle" font-size="13" font-weight="600" fill="#D97706">Lexical / keyword recall</text>
    </g>
  </svg>
  <figcaption style="margin-top: 12px; font-size: 0.9rem; color: #6c7a89;">Dense vectors capture semantic meaning across every dimension; sparse vectors capture exact keyword signals across a much larger vocabulary, with most positions empty. ArcadeDB 26.5.1 supports both, and can fuse them server-side.</figcaption>
</figure>

<p>Highlights:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">vector.fuse(...)</code> performs <strong>server-side result fusion</strong> using RRF, DBSF, and LINEAR strategies, so dense + sparse + lexical scores can be combined without round-trips to the client.</li>
  <li><code class="language-plaintext highlighter-rouge">vector.neighbors(...)</code> supports <code class="language-plaintext highlighter-rouge">groupBy</code> / <code class="language-plaintext highlighter-rouge">groupSize</code> options for <strong>diversified retrieval</strong> with nested-field grouping.</li>
  <li><strong>WAND / BlockMax-WAND</strong> dynamic pruning scales sparse retrieval to 100M+ documents.</li>
  <li><strong>Sparse-vector partitioning</strong> allows sharding by tenant or domain.</li>
  <li>New reranker SQL functions enable two-stage retrieval pipelines.</li>
</ul>

<h3 id="int8-quantization-for-dense-vectors">INT8 Quantization for Dense Vectors</h3>

<p>End-to-end <strong>INT8 support</strong> throughout the dense vector pipeline, dramatically reducing disk and RSS by avoiding the FP32 path entirely. A shared 8-bit representation now flows across ingest, storage, and query.</p>

<h3 id="external-property-storage">EXTERNAL Property Storage</h3>

<p>A new paired-bucket layout isolates <strong>heavy property values</strong> (vectors, large strings, JSON) to separate external buckets while keeping the hot row data compact. The result: significantly cheaper scans on wide records.</p>

<figure style="margin: 32px 0; text-align: center;">
  <svg viewBox="0 0 760 500" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Classic bucket layout vs EXTERNAL paired-bucket layout" style="max-width: 100%; height: auto; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;">
    <!-- Top panel: classic layout -->
    <g>
      <rect x="20" y="20" width="720" height="200" rx="12" fill="#F0F4F8" stroke="#D1D9E0" stroke-width="1" />
      <text x="380" y="50" text-anchor="middle" font-size="16" font-weight="700" fill="#1F2937">Without EXTERNAL &middot; everything in the main bucket</text>

      <g transform="translate(80, 80)">
        <rect x="0" y="0" width="600" height="34" fill="#FFFFFF" stroke="#D1D9E0" />
        <rect x="0" y="0" width="60" height="34" fill="#DBEAFE" stroke="#D1D9E0" />
        <text x="30" y="22" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">id</text>
        <rect x="60" y="0" width="80" height="34" fill="#E0E7FF" stroke="#D1D9E0" />
        <text x="100" y="22" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">name</text>
        <rect x="140" y="0" width="460" height="34" fill="#FEE2E2" stroke="#D1D9E0" />
        <text x="370" y="22" text-anchor="middle" font-size="11" font-weight="600" fill="#991B1B">vector (1,536 floats) &middot; large JSON &middot; blob</text>

        <rect x="0" y="42" width="600" height="34" fill="#FFFFFF" stroke="#D1D9E0" />
        <rect x="0" y="42" width="60" height="34" fill="#DBEAFE" stroke="#D1D9E0" />
        <text x="30" y="64" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">id</text>
        <rect x="60" y="42" width="80" height="34" fill="#E0E7FF" stroke="#D1D9E0" />
        <text x="100" y="64" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">name</text>
        <rect x="140" y="42" width="460" height="34" fill="#FEE2E2" stroke="#D1D9E0" />
        <text x="370" y="64" text-anchor="middle" font-size="11" font-weight="600" fill="#991B1B">vector (1,536 floats) &middot; large JSON &middot; blob</text>

        <rect x="0" y="84" width="600" height="34" fill="#FFFFFF" stroke="#D1D9E0" />
        <rect x="0" y="84" width="60" height="34" fill="#DBEAFE" stroke="#D1D9E0" />
        <text x="30" y="106" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">id</text>
        <rect x="60" y="84" width="80" height="34" fill="#E0E7FF" stroke="#D1D9E0" />
        <text x="100" y="106" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">name</text>
        <rect x="140" y="84" width="460" height="34" fill="#FEE2E2" stroke="#D1D9E0" />
        <text x="370" y="106" text-anchor="middle" font-size="11" font-weight="600" fill="#991B1B">vector (1,536 floats) &middot; large JSON &middot; blob</text>
      </g>

      <text x="380" y="210" text-anchor="middle" font-size="12" fill="#991B1B">Every scan reads heavy payloads &rarr; wide rows, slow scans</text>
    </g>

    <!-- Down arrow between panels -->
    <text x="380" y="245" text-anchor="middle" font-size="14" font-weight="700" fill="#6B7280">&#9660;</text>

    <!-- Bottom panel: EXTERNAL -->
    <g>
      <rect x="20" y="260" width="720" height="230" rx="12" fill="#ECFDF5" stroke="#A7F3D0" stroke-width="1" />
      <text x="380" y="290" text-anchor="middle" font-size="16" font-weight="700" fill="#065F46">With EXTERNAL (NEW) &middot; paired-bucket layout</text>

      <text x="170" y="320" text-anchor="middle" font-size="12" font-weight="600" fill="#1F2937">Main Bucket (compact, hot)</text>
      <text x="600" y="320" text-anchor="middle" font-size="12" font-weight="600" fill="#1F2937">External Bucket (lazy)</text>

      <!-- Main bucket: thin rows -->
      <g transform="translate(50, 340)">
        <rect x="0" y="0" width="240" height="28" fill="#FFFFFF" stroke="#A7F3D0" />
        <rect x="0" y="0" width="60" height="28" fill="#DBEAFE" stroke="#A7F3D0" />
        <text x="30" y="19" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">id</text>
        <rect x="60" y="0" width="100" height="28" fill="#E0E7FF" stroke="#A7F3D0" />
        <text x="110" y="19" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">name</text>
        <rect x="160" y="0" width="80" height="28" fill="#FEF3C7" stroke="#A7F3D0" />
        <text x="200" y="19" text-anchor="middle" font-size="10" font-weight="600" fill="#92400E">&rarr; ref</text>

        <rect x="0" y="36" width="240" height="28" fill="#FFFFFF" stroke="#A7F3D0" />
        <rect x="0" y="36" width="60" height="28" fill="#DBEAFE" stroke="#A7F3D0" />
        <text x="30" y="55" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">id</text>
        <rect x="60" y="36" width="100" height="28" fill="#E0E7FF" stroke="#A7F3D0" />
        <text x="110" y="55" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">name</text>
        <rect x="160" y="36" width="80" height="28" fill="#FEF3C7" stroke="#A7F3D0" />
        <text x="200" y="55" text-anchor="middle" font-size="10" font-weight="600" fill="#92400E">&rarr; ref</text>

        <rect x="0" y="72" width="240" height="28" fill="#FFFFFF" stroke="#A7F3D0" />
        <rect x="0" y="72" width="60" height="28" fill="#DBEAFE" stroke="#A7F3D0" />
        <text x="30" y="91" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">id</text>
        <rect x="60" y="72" width="100" height="28" fill="#E0E7FF" stroke="#A7F3D0" />
        <text x="110" y="91" text-anchor="middle" font-size="11" font-weight="600" fill="#1E40AF">name</text>
        <rect x="160" y="72" width="80" height="28" fill="#FEF3C7" stroke="#A7F3D0" />
        <text x="200" y="91" text-anchor="middle" font-size="10" font-weight="600" fill="#92400E">&rarr; ref</text>
      </g>

      <!-- Lazy-load arrows -->
      <g stroke="#10B981" stroke-width="1.5" stroke-dasharray="5 3" fill="none">
        <line x1="295" y1="354" x2="485" y2="354" />
        <line x1="295" y1="390" x2="485" y2="390" />
        <line x1="295" y1="426" x2="485" y2="426" />
      </g>
      <g fill="#10B981" stroke="none">
        <polygon points="485,354 477,350 477,358" />
        <polygon points="485,390 477,386 477,394" />
        <polygon points="485,426 477,422 477,430" />
      </g>
      <text x="390" y="343" text-anchor="middle" font-size="10" font-style="italic" fill="#065F46">loaded lazily on access</text>

      <!-- External bucket: heavy rows -->
      <g transform="translate(490, 340)">
        <rect x="0" y="0" width="220" height="28" fill="#FEE2E2" stroke="#A7F3D0" />
        <text x="110" y="19" text-anchor="middle" font-size="10" font-weight="600" fill="#991B1B">vector / JSON / blob</text>

        <rect x="0" y="36" width="220" height="28" fill="#FEE2E2" stroke="#A7F3D0" />
        <text x="110" y="55" text-anchor="middle" font-size="10" font-weight="600" fill="#991B1B">vector / JSON / blob</text>

        <rect x="0" y="72" width="220" height="28" fill="#FEE2E2" stroke="#A7F3D0" />
        <text x="110" y="91" text-anchor="middle" font-size="10" font-weight="600" fill="#991B1B">vector / JSON / blob</text>
      </g>

      <text x="380" y="475" text-anchor="middle" font-size="12" font-weight="600" fill="#065F46">Compact rows &rarr; fast scans. Heavy values fetched only when read.</text>
    </g>
  </svg>
  <figcaption style="margin-top: 12px; font-size: 0.9rem; color: #6c7a89;">EXTERNAL Property Storage moves heavy values (vectors, large strings, JSON) to a paired external bucket. The main bucket stays compact, scans stay hot, and large payloads are loaded lazily only when the row is actually read.</figcaption>
</figure>

<h3 id="query-partitioning">Query Partitioning</h3>

<p>A partition-aware planner now <strong>prunes unnecessary partitions</strong> from SQL and Cypher execution plans, with integrity safeguards for partitioned types.</p>

<h3 id="high-availability-offline-cluster-bootstrap">High Availability: Offline Cluster Bootstrap</h3>

<p>Fresh HA clusters can now initialize from <strong>pre-seeded databases</strong> via snapshot-and-restore, eliminating the need for full dataset re-replication when expanding or rebuilding a cluster.</p>

<h3 id="production-ready-helm-chart">Production-Ready Helm Chart</h3>

<p>The Helm chart has been reworked to align with the Raft-based HA subsystem introduced in 26.4.2, and is now suitable for production deployments.</p>

<h3 id="cypher-administrative-commands">Cypher Administrative Commands</h3>

<p>Standard administrative commands <code class="language-plaintext highlighter-rouge">SHOW INDEXES</code> and <code class="language-plaintext highlighter-rouge">SHOW CONSTRAINTS</code> are now supported in OpenCypher.</p>

<h3 id="sql-find-references">SQL: FIND REFERENCES</h3>

<p>The OrientDB-compatible <code class="language-plaintext highlighter-rouge">FIND REFERENCES</code> command is back, making it easy to locate all records pointing to a given RID — particularly useful for <a href="https://arcadedb.com/orientdb.html">migrations from OrientDB</a>.</p>

<h3 id="c-end-to-end-testing">C# End-to-End Testing</h3>

<p>A new C# test suite validates ArcadeDB over the PostgreSQL wire protocol via <strong>Npgsql</strong> and <strong>Testcontainers</strong> on every build.</p>

<h3 id="studio-enhancements">Studio Enhancements</h3>

<ul>
  <li>Full-screen graph view mode</li>
  <li>Clear query button / textbox</li>
  <li>Session reset on token expiration</li>
  <li>Persistent error message display</li>
  <li>Query history no longer auto-submits</li>
  <li>Inherited indexes now visible</li>
  <li>HA cluster peer add / remove controls</li>
  <li>Human-readable peer names in <code class="language-plaintext highlighter-rouge">HA_SERVER_LIST</code></li>
</ul>

<h2 id="major-fixes">Major Fixes</h2>

<h3 id="opencypher-correctness">OpenCypher Correctness</h3>

<p>This release lands an <strong>extensive batch of OpenCypher fixes</strong> across pattern matching, write clauses, subqueries, and temporal expressions. Among the highlights:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">valueType(...)</code> now reports the <code class="language-plaintext highlighter-rouge">NOT NULL</code> suffix for non-null values.</li>
  <li><code class="language-plaintext highlighter-rouge">point(...)</code> WGS-84-3D exposes <code class="language-plaintext highlighter-rouge">.height</code> as a <code class="language-plaintext highlighter-rouge">.z</code> alias.</li>
  <li><code class="language-plaintext highlighter-rouge">CALL ... YIELD</code> preserves carried <code class="language-plaintext highlighter-rouge">WITH</code> variables.</li>
  <li>Variable-length patterns no longer re-traverse previously bound relationships.</li>
  <li><code class="language-plaintext highlighter-rouge">MERGE</code> with an unbound label-only endpoint creates fresh nodes appropriately.</li>
  <li><code class="language-plaintext highlighter-rouge">SET</code> correctly propagates across all aliases for the same node.</li>
  <li>Self-referential property updates remain idempotent across row fanout.</li>
  <li>Temporal component access on date/datetime values now works correctly.</li>
  <li><code class="language-plaintext highlighter-rouge">EXISTS { ... }</code> subqueries correctly evaluate outer-variable expressions.</li>
  <li><code class="language-plaintext highlighter-rouge">MATCH</code> immediately after <code class="language-plaintext highlighter-rouge">CREATE</code> now sees newly created labeled nodes.</li>
  <li><code class="language-plaintext highlighter-rouge">MERGE ... ON MATCH SET</code> returns post-update property values.</li>
  <li><code class="language-plaintext highlighter-rouge">MATCH</code> on parent edge types matches sub-typed edges (polymorphic traversal).</li>
  <li><code class="language-plaintext highlighter-rouge">shortestPath</code> / <code class="language-plaintext highlighter-rouge">allShortestPaths</code> with variable-length alternation match correctly.</li>
  <li><code class="language-plaintext highlighter-rouge">WHERE false</code> literal predicates are no longer ignored.</li>
</ul>

<p>…plus dozens more. See the <a href="https://github.com/ArcadeData/arcadedb/releases/tag/26.5.1">full release notes</a> for the complete list.</p>

<h3 id="sql">SQL</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">CONTAINSALL</code> compares lists of <code class="language-plaintext highlighter-rouge">Identifiable</code>s against RID strings correctly.</li>
  <li>Correlated <code class="language-plaintext highlighter-rouge">COLLECT { ... }</code> / <code class="language-plaintext highlighter-rouge">COUNT { ... }</code> subqueries evaluate with outer-variable access.</li>
  <li><code class="language-plaintext highlighter-rouge">SEARCH_INDEX</code> and <code class="language-plaintext highlighter-rouge">SEARCH_FIELDS</code> propagate return values in filters and handle wildcards properly.</li>
  <li><code class="language-plaintext highlighter-rouge">SELECT</code> with a non-unique LSM index returns rows after partial deletes.</li>
  <li>Edge creation with <code class="language-plaintext highlighter-rouge">CONTENT</code> no longer ignores properties.</li>
  <li><code class="language-plaintext highlighter-rouge">algo.dijkstra</code> yields correct weight calculations.</li>
  <li><code class="language-plaintext highlighter-rouge">UPDATE EDGE SET @in / @out</code> correctly rewires vertex edge lists.</li>
  <li><code class="language-plaintext highlighter-rouge">point.withinBBox(...)</code> supports cross-meridian bounding boxes.</li>
</ul>

<h3 id="storage-indexing--schema">Storage, Indexing &amp; Schema</h3>

<ul>
  <li>HASH index lookups return rows with data encryption enabled.</li>
  <li>Orphan <code class="language-plaintext highlighter-rouge">TypeIndex</code> wrappers are dropped when the last bucket child is removed.</li>
  <li>Subclass indexes are no longer incorrectly related to superclass indexes.</li>
  <li>Manual index names are respected on creation.</li>
  <li>Inherited indexes are now visible in Studio.</li>
</ul>

<h3 id="high-availability">High Availability</h3>

<ul>
  <li>Schema changes replicate to followers, closing WAL gaps.</li>
  <li>Cluster inconsistency reports after node shutdowns resolved.</li>
  <li>Massive inserts via gRPC replicate correctly.</li>
  <li><code class="language-plaintext highlighter-rouge">/api/v1/batch</code> no longer fails on followers with “Error on updating dictionary”.</li>
  <li><code class="language-plaintext highlighter-rouge">/batch</code> endpoint eliminates HTTP 500 NPE after successful commits.</li>
  <li>e2e-ha integration tests stabilized with on-demand Toxiproxy support.</li>
</ul>

<h3 id="wire-protocols">Wire Protocols</h3>

<p><strong>PostgreSQL</strong></p>

<ul>
  <li>Empty <code class="language-plaintext highlighter-rouge">SELECT</code> results include <code class="language-plaintext highlighter-rouge">RowDescription</code> schema.</li>
  <li><code class="language-plaintext highlighter-rouge">SHOW server_version</code> returns a proper value for SQLAlchemy.</li>
  <li>Cypher <code class="language-plaintext highlighter-rouge">WHERE id(n) IN $array</code> round-trips correctly.</li>
  <li>Binary array deserialization implemented for JDBC <code class="language-plaintext highlighter-rouge">setArray</code>.</li>
  <li>Named and positional parameters now work via Npgsql (C#).</li>
</ul>

<p><strong>Bolt</strong></p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">EXPLAIN</code> / <code class="language-plaintext highlighter-rouge">PROFILE</code> plans are included in <code class="language-plaintext highlighter-rouge">PULL</code> <code class="language-plaintext highlighter-rouge">SUCCESS</code> metadata.</li>
  <li>Executor recognizes the new sparse vector type.</li>
</ul>

<p><strong>gRPC</strong></p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">InsertStream</code> throughput stays consistent after extended <code class="language-plaintext highlighter-rouge">executeQuery</code> calls.</li>
  <li>Commit-time constraint violations surface as stream-level errors.</li>
  <li><code class="language-plaintext highlighter-rouge">DATE</code> columns no longer corrupted via parameter binding.</li>
  <li><code class="language-plaintext highlighter-rouge">ARRAY_OF_LONGS</code> and <code class="language-plaintext highlighter-rouge">DATETIME</code> preserve precision in parameter binding.</li>
</ul>

<p><strong>HTTP</strong></p>

<ul>
  <li>INT8 query vectors routed via <code class="language-plaintext highlighter-rouge">$bytes</code> / <code class="language-plaintext highlighter-rouge">$int8</code> markers.</li>
  <li><code class="language-plaintext highlighter-rouge">RemoteGraphBatch</code> honors unique edge constraints.</li>
  <li>Edge <code class="language-plaintext highlighter-rouge">DATETIME</code> parser accepts ISO suffixes.</li>
</ul>

<h3 id="dependencies">Dependencies</h3>

<p>Notable upgrades include Netty 4.2.13.Final, Undertow 2.4.0.Final, PostgreSQL JDBC 42.7.11, Neo4j Java Driver 6.1.0, Jackson Databind 2.21.3, GraalVM 25.0.3, Testcontainers 2.0.5, plus Studio frontend improvements and security updates across the dependency stack.</p>

<h2 id="getting-started-with-2651">Getting Started with 26.5.1</h2>

<h3 id="docker">Docker</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker pull arcadedata/arcadedb:26.5.1
</code></pre></div></div>

<p>Visit our <a href="https://hub.docker.com/r/arcadedata/arcadedb">Docker Hub repository</a> for more information.</p>

<h3 id="maven">Maven</h3>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;dependency&gt;</span>
    <span class="nt">&lt;groupId&gt;</span>com.arcadedb<span class="nt">&lt;/groupId&gt;</span>
    <span class="nt">&lt;artifactId&gt;</span>arcadedb-engine<span class="nt">&lt;/artifactId&gt;</span>
    <span class="nt">&lt;version&gt;</span>26.5.1<span class="nt">&lt;/version&gt;</span>
<span class="nt">&lt;/dependency&gt;</span>
</code></pre></div></div>

<p>All artifacts are available on <a href="https://repo.maven.apache.org/maven2/com/arcadedb/">Maven Central</a>.</p>

<h3 id="documentation">Documentation</h3>

<p>For detailed information on features and usage, refer to our <a href="https://docs.arcadedb.com/">comprehensive documentation</a>.</p>

<h2 id="compatibility-note">Compatibility Note</h2>

<p>This release maintains 100% compatibility with previous database formats, meaning no export/import is required when upgrading. As always, we recommend creating a database backup before upgrading.</p>

<hr />

<p><strong>Download ArcadeDB 26.5.1 now</strong>: <a href="https://github.com/ArcadeData/arcadedb/releases/tag/26.5.1">GitHub Releases</a></p>

<p>Thanks to everyone in the community who reported issues, opened PRs, and helped shape this release.</p>

<p>Luca Garulli
ArcadeDB Founder</p>]]></content><author><name>Luca Garulli</name></author><category term="Multi-Model" /><category term="Vector Search" /><category term="OpenCypher" /><category term="Graph Database" /><category term="Release" /><summary type="html"><![CDATA[ArcadeDB 26.5.1 ships a brand-new sparse vector index with server-side hybrid retrieval, INT8 quantization end-to-end, EXTERNAL property storage, query partitioning, offline HA cluster bootstrap, and an extensive batch of OpenCypher correctness fixes.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/release-v26.5.1.jpg" /><media:content medium="image" url="https://arcadedb.com/assets/images/release-v26.5.1.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Call Me Maybe, ArcadeDB? 34 Jepsen Tests, 34 Passes</title><link href="https://arcadedb.com/blog/arcadedb-jepsen-tests-34-pass/" rel="alternate" type="text/html" title="Call Me Maybe, ArcadeDB? 34 Jepsen Tests, 34 Passes" /><published>2026-04-28T00:00:00+00:00</published><updated>2026-04-28T00:00:00+00:00</updated><id>https://arcadedb.com/blog/arcadedb-jepsen-tests-34-pass</id><content type="html" xml:base="https://arcadedb.com/blog/arcadedb-jepsen-tests-34-pass/"><![CDATA[<p>If you’ve followed distributed databases for any length of time, you’ve probably read <a href="https://jepsen.io/analyses">a Jepsen analysis</a>. If you’ve read one, you know the feeling: a database vendor claims linearizability, <a href="https://aphyr.com/">Kyle Kingsbury</a> introduces some network partitions, and a few weeks later we all learn what the database <em>actually</em> does under failure.</p>

<p>That feeling is the reason we wrote 34 Jepsen tests for <a href="https://arcadedb.com/">ArcadeDB</a>. We wanted to know what <em>we</em> actually do under failure, before we ask anyone else to trust us.</p>

<p>Today we’re publishing the full test suite, the methodology, and the results.</p>

<blockquote>
  <p><strong>First, the disclaimer.</strong> This is <strong>not an official Jepsen analysis</strong>. Jepsen LLC did not commission, run, review, or certify these tests. We wrote them in-house using the open-source <a href="https://github.com/jepsen-io/jepsen">Jepsen framework</a> (the same framework Kyle uses for his official analyses), but the design, execution, and results are entirely ours. We’re publishing everything so the community can scrutinize the methodology, and we’d genuinely love a real analysis from Jepsen LLC one day. <strong>Hi Kyle, if you’re reading this, please tear it apart.</strong></p>
</blockquote>

<h2 id="summary">Summary</h2>

<ul>
  <li><strong>Database under test:</strong> ArcadeDB on the <code class="language-plaintext highlighter-rouge">apache-ratis</code> branch, with high availability built on <a href="https://ratis.apache.org/">Apache Ratis</a> (Raft consensus).</li>
  <li><strong>Cluster:</strong> 5 Debian nodes in Docker, controlled by a Jepsen 0.3.11 control node.</li>
  <li><strong>Workloads (6):</strong> <code class="language-plaintext highlighter-rouge">bank</code>, <code class="language-plaintext highlighter-rouge">set</code>, <code class="language-plaintext highlighter-rouge">elle</code>, <code class="language-plaintext highlighter-rouge">register</code>, <code class="language-plaintext highlighter-rouge">register-follower</code>, <code class="language-plaintext highlighter-rouge">register-bookmark</code>.</li>
  <li><strong>Faults (7 nemeses):</strong> <code class="language-plaintext highlighter-rouge">none</code>, <code class="language-plaintext highlighter-rouge">partition</code>, <code class="language-plaintext highlighter-rouge">kill</code>, <code class="language-plaintext highlighter-rouge">pause</code>, <code class="language-plaintext highlighter-rouge">clock</code>, <code class="language-plaintext highlighter-rouge">all</code>, <code class="language-plaintext highlighter-rouge">all+clock</code>.</li>
  <li><strong>Total runs:</strong> 34 (20 leader workloads + 14 follower workloads).</li>
  <li><strong>Result:</strong> 34 / 34 PASS. Zero linearizability violations, zero lost writes, zero ACID anomalies.</li>
  <li><strong>Source code:</strong> <a href="https://github.com/ArcadeData/arcadedb-jepsen">github.com/ArcadeData/arcadedb-jepsen</a> (Apache 2.0).</li>
  <li><strong>Caveat:</strong> This is in-house testing, not a Jepsen LLC certification. Independent review welcome.</li>
</ul>

<h2 id="what-is-jepsen">What is Jepsen?</h2>

<p><a href="https://jepsen.io">Jepsen</a> is the gold-standard open-source framework for testing distributed systems. Created by Kyle Kingsbury (better known as <a href="https://aphyr.com/">aphyr</a>), it became famous through the <a href="https://aphyr.com/tags/jepsen">Call Me Maybe</a> blog series, which methodically dismantled the consistency claims of databases like MongoDB, Redis, Cassandra, ElasticSearch, and many others.</p>

<p>What makes Jepsen special isn’t just the fault injection (network partitions via <code class="language-plaintext highlighter-rouge">iptables</code>, process kills with <code class="language-plaintext highlighter-rouge">SIGKILL</code>, GC-style pauses with <code class="language-plaintext highlighter-rouge">SIGSTOP</code>/<code class="language-plaintext highlighter-rouge">SIGCONT</code>, clock skew via <code class="language-plaintext highlighter-rouge">date -s</code>). It’s the <strong>checkers</strong>:</p>

<ul>
  <li><strong><a href="https://github.com/jepsen-io/knossos">Knossos</a></strong>: a linearizability checker that takes the history of operations and tries to find a serial ordering consistent with each client’s observed responses. If no such ordering exists, your “linearizable” register isn’t.</li>
  <li><strong><a href="https://github.com/jepsen-io/elle">Elle</a></strong>: a black-box transaction-isolation checker that builds a dependency graph from the transaction history and looks for cycles. Cycles map to specific anomalies: G0 (dirty write), G1a (aborted read), G1b (intermediate read), G1c (circular information flow), G2 (anti-dependency cycle), and lost updates.</li>
</ul>

<p>You can’t bluff your way past either of them. They either find a counterexample, or they certify the history.</p>

<h2 id="what-we-tested">What we tested</h2>

<p>The tests run against the ArcadeDB <code class="language-plaintext highlighter-rouge">apache-ratis</code> branch, where high availability is implemented on top of <a href="https://ratis.apache.org/">Apache Ratis</a> (the production-grade Raft library that also powers Apache Ozone). The cluster is <strong>5 Debian nodes</strong> in Docker, plus a control node running <a href="https://leiningen.org/">Leiningen</a> and Jepsen 0.3.11. Each test gets a fresh cluster to eliminate cross-test contamination.</p>

<h3 id="six-workloads">Six workloads</h3>

<table>
  <thead>
    <tr>
      <th>Workload</th>
      <th>What it checks</th>
      <th>Checker</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>bank</strong></td>
      <td>ACID balance conservation across 5 accounts during concurrent transfers</td>
      <td>Custom conservation invariant</td>
    </tr>
    <tr>
      <td><strong>set</strong></td>
      <td>No acknowledged write is ever lost during replication</td>
      <td>Custom set checker</td>
    </tr>
    <tr>
      <td><strong>elle</strong></td>
      <td>Transaction isolation: G0, G1a, G1b, G2, lost updates</td>
      <td><a href="https://github.com/jepsen-io/elle">Elle</a></td>
    </tr>
    <tr>
      <td><strong>register</strong></td>
      <td>Linearizability of single-key read/write/CAS, leader reads</td>
      <td><a href="https://github.com/jepsen-io/knossos">Knossos</a></td>
    </tr>
    <tr>
      <td><strong>register-follower</strong></td>
      <td>Linearizability when reads are routed to a <em>non-leader</em> (ReadIndex path)</td>
      <td>Knossos</td>
    </tr>
    <tr>
      <td><strong>register-bookmark</strong></td>
      <td>Read-your-writes via commit-index bookmarks on follower reads</td>
      <td>Knossos</td>
    </tr>
  </tbody>
</table>

<h3 id="seven-nemeses">Seven nemeses</h3>

<table>
  <thead>
    <tr>
      <th>Nemesis</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">none</code></td>
      <td>Baseline, no faults</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">partition</code></td>
      <td>Random network partitions via <code class="language-plaintext highlighter-rouge">iptables</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">kill</code></td>
      <td><code class="language-plaintext highlighter-rouge">SIGKILL</code> random nodes (crash)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">pause</code></td>
      <td><code class="language-plaintext highlighter-rouge">SIGSTOP</code>/<code class="language-plaintext highlighter-rouge">SIGCONT</code> random nodes (long GC pause)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">clock</code></td>
      <td>Random ±60s clock shifts via <code class="language-plaintext highlighter-rouge">date -s</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">all</code></td>
      <td>partition + kill + pause concurrently</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">all+clock</code></td>
      <td>all + clock skew</td>
    </tr>
  </tbody>
</table>

<p>The leader workloads run against 5 nemeses (we omit <code class="language-plaintext highlighter-rouge">clock</code> and <code class="language-plaintext highlighter-rouge">all+clock</code> because leader-only reads aren’t sensitive to follower clock drift). The follower workloads run the full 7. That’s <strong>20 + 14 = 34 tests</strong>.</p>

<h2 id="the-results">The Results</h2>

<figure style="margin: 2rem 0; overflow-x: auto;">
<svg viewBox="0 0 760 360" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="matrixTitle" style="max-width: 100%; height: auto; font-family: -apple-system, BlinkMacSystemFont, sans-serif;">
  <title id="matrixTitle">ArcadeDB Jepsen test matrix: 34 of 34 passing</title>
  <rect x="0" y="0" width="760" height="360" fill="#ffffff" />
  <text x="380" y="28" text-anchor="middle" font-size="18" font-weight="700" fill="#111">ArcadeDB Jepsen Test Matrix &middot; 34 / 34 PASS</text>

  <!-- Column headers (nemeses) -->
  <g font-size="12" fill="#333" text-anchor="middle">
    <text x="320" y="68">none</text>
    <text x="380" y="68">partition</text>
    <text x="440" y="68">kill</text>
    <text x="500" y="68">pause</text>
    <text x="560" y="68">clock</text>
    <text x="620" y="68">all</text>
    <text x="690" y="68">all+clock</text>
  </g>

  <!-- Row labels (workloads) -->
  <g font-size="13" fill="#222" text-anchor="end" font-weight="600">
    <text x="240" y="100">bank</text>
    <text x="240" y="140">set</text>
    <text x="240" y="180">elle</text>
    <text x="240" y="220">register</text>
    <text x="240" y="260">register-follower</text>
    <text x="240" y="300">register-bookmark</text>
  </g>

  <!-- Helper: cell drawing -->
  <!-- Row 1: bank, 5 nemeses pass, clock + all+clock are N/A -->
  <g>
    <!-- bank -->
    <rect x="290" y="80" width="60" height="36" fill="#16a34a" /><text x="320" y="103" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="350" y="80" width="60" height="36" fill="#16a34a" /><text x="380" y="103" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="410" y="80" width="60" height="36" fill="#16a34a" /><text x="440" y="103" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="470" y="80" width="60" height="36" fill="#16a34a" /><text x="500" y="103" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="530" y="80" width="60" height="36" fill="#e5e7eb" /><text x="560" y="103" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <rect x="590" y="80" width="60" height="36" fill="#16a34a" /><text x="620" y="103" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="650" y="80" width="80" height="36" fill="#e5e7eb" /><text x="690" y="103" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <!-- set -->
    <rect x="290" y="120" width="60" height="36" fill="#16a34a" /><text x="320" y="143" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="350" y="120" width="60" height="36" fill="#16a34a" /><text x="380" y="143" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="410" y="120" width="60" height="36" fill="#16a34a" /><text x="440" y="143" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="470" y="120" width="60" height="36" fill="#16a34a" /><text x="500" y="143" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="530" y="120" width="60" height="36" fill="#e5e7eb" /><text x="560" y="143" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <rect x="590" y="120" width="60" height="36" fill="#16a34a" /><text x="620" y="143" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="650" y="120" width="80" height="36" fill="#e5e7eb" /><text x="690" y="143" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <!-- elle -->
    <rect x="290" y="160" width="60" height="36" fill="#16a34a" /><text x="320" y="183" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="350" y="160" width="60" height="36" fill="#16a34a" /><text x="380" y="183" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="410" y="160" width="60" height="36" fill="#16a34a" /><text x="440" y="183" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="470" y="160" width="60" height="36" fill="#16a34a" /><text x="500" y="183" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="530" y="160" width="60" height="36" fill="#e5e7eb" /><text x="560" y="183" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <rect x="590" y="160" width="60" height="36" fill="#16a34a" /><text x="620" y="183" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="650" y="160" width="80" height="36" fill="#e5e7eb" /><text x="690" y="183" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <!-- register -->
    <rect x="290" y="200" width="60" height="36" fill="#16a34a" /><text x="320" y="223" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="350" y="200" width="60" height="36" fill="#16a34a" /><text x="380" y="223" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="410" y="200" width="60" height="36" fill="#16a34a" /><text x="440" y="223" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="470" y="200" width="60" height="36" fill="#16a34a" /><text x="500" y="223" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="530" y="200" width="60" height="36" fill="#e5e7eb" /><text x="560" y="223" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <rect x="590" y="200" width="60" height="36" fill="#16a34a" /><text x="620" y="223" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="650" y="200" width="80" height="36" fill="#e5e7eb" /><text x="690" y="223" text-anchor="middle" font-size="11" fill="#6b7280">n/a</text>
    <!-- register-follower -->
    <rect x="290" y="240" width="60" height="36" fill="#16a34a" /><text x="320" y="263" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="350" y="240" width="60" height="36" fill="#16a34a" /><text x="380" y="263" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="410" y="240" width="60" height="36" fill="#16a34a" /><text x="440" y="263" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="470" y="240" width="60" height="36" fill="#16a34a" /><text x="500" y="263" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="530" y="240" width="60" height="36" fill="#16a34a" /><text x="560" y="263" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="590" y="240" width="60" height="36" fill="#16a34a" /><text x="620" y="263" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="650" y="240" width="80" height="36" fill="#16a34a" /><text x="690" y="263" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <!-- register-bookmark -->
    <rect x="290" y="280" width="60" height="36" fill="#16a34a" /><text x="320" y="303" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="350" y="280" width="60" height="36" fill="#16a34a" /><text x="380" y="303" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="410" y="280" width="60" height="36" fill="#16a34a" /><text x="440" y="303" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="470" y="280" width="60" height="36" fill="#16a34a" /><text x="500" y="303" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="530" y="280" width="60" height="36" fill="#16a34a" /><text x="560" y="303" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="590" y="280" width="60" height="36" fill="#16a34a" /><text x="620" y="303" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
    <rect x="650" y="280" width="80" height="36" fill="#16a34a" /><text x="690" y="303" text-anchor="middle" font-size="18" fill="#fff" font-weight="700">&#10003;</text>
  </g>

  <text x="380" y="345" text-anchor="middle" font-size="12" fill="#6b7280">Green = passed &middot; Grey = not applicable for this workload</text>
</svg>
<figcaption style="text-align: center; color: #6b7280; font-size: 0.9em; margin-top: 0.5rem;">Figure 1. The 34-test matrix. Every executed cell passed.</figcaption>
</figure>

<p>Behind every green check is a 90-second run (30 seconds for the most expensive Knossos workloads) of concurrent client operations against the cluster while the chosen nemesis hammers the nodes. Then the checker takes the recorded history and either says <code class="language-plaintext highlighter-rouge">:valid? true</code> or hands you a counterexample.</p>

<h2 id="the-faults-visually">The Faults, Visually</h2>

<p>The interesting Jepsen tests aren’t the <code class="language-plaintext highlighter-rouge">none</code> baseline. They’re what happens when the cluster is being actively misbehaved. Here’s what we throw at the 5-node cluster.</p>

<figure style="margin: 2rem 0; overflow-x: auto;">
<svg viewBox="0 0 780 320" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="nemTitle" style="max-width: 100%; height: auto; font-family: -apple-system, BlinkMacSystemFont, sans-serif;">
  <title id="nemTitle">Jepsen nemesis fault types applied to a 5-node ArcadeDB cluster</title>
  <rect x="0" y="0" width="780" height="320" fill="#ffffff" />
  <text x="390" y="28" text-anchor="middle" font-size="16" font-weight="700" fill="#111">Nemesis faults applied to a 5-node Raft cluster</text>

  <!-- Panel 1: partition -->
  <g transform="translate(20, 50)">
    <text x="90" y="0" text-anchor="middle" font-size="13" font-weight="600" fill="#111">partition</text>
    <text x="90" y="16" text-anchor="middle" font-size="10" fill="#6b7280">iptables network split</text>
    <!-- left side nodes -->
    <circle cx="40" cy="80" r="16" fill="#3b82f6" /><text x="40" y="84" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">L</text>
    <circle cx="40" cy="130" r="16" fill="#60a5fa" /><text x="40" y="134" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <!-- right side nodes -->
    <circle cx="140" cy="70" r="16" fill="#60a5fa" /><text x="140" y="74" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="140" cy="120" r="16" fill="#60a5fa" /><text x="140" y="124" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="140" cy="170" r="16" fill="#60a5fa" /><text x="140" y="174" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <!-- partition wall -->
    <line x1="90" y1="50" x2="90" y2="200" stroke="#dc2626" stroke-width="3" stroke-dasharray="6,4" />
    <text x="90" y="220" text-anchor="middle" font-size="10" fill="#dc2626" font-weight="600">split</text>
  </g>

  <!-- Panel 2: kill -->
  <g transform="translate(210, 50)">
    <text x="90" y="0" text-anchor="middle" font-size="13" font-weight="600" fill="#111">kill</text>
    <text x="90" y="16" text-anchor="middle" font-size="10" fill="#6b7280">SIGKILL random node</text>
    <circle cx="50" cy="90" r="16" fill="#3b82f6" /><text x="50" y="94" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">L</text>
    <circle cx="130" cy="90" r="16" fill="#60a5fa" /><text x="130" y="94" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="50" cy="160" r="16" fill="#60a5fa" /><text x="50" y="164" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="130" cy="160" r="16" fill="#9ca3af" opacity="0.4" />
    <line x1="118" y1="148" x2="142" y2="172" stroke="#dc2626" stroke-width="3" />
    <line x1="142" y1="148" x2="118" y2="172" stroke="#dc2626" stroke-width="3" />
    <circle cx="90" cy="125" r="16" fill="#60a5fa" /><text x="90" y="129" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <text x="90" y="220" text-anchor="middle" font-size="10" fill="#dc2626" font-weight="600">crash</text>
  </g>

  <!-- Panel 3: pause -->
  <g transform="translate(400, 50)">
    <text x="90" y="0" text-anchor="middle" font-size="13" font-weight="600" fill="#111">pause</text>
    <text x="90" y="16" text-anchor="middle" font-size="10" fill="#6b7280">SIGSTOP &#8594; SIGCONT</text>
    <circle cx="50" cy="90" r="16" fill="#3b82f6" /><text x="50" y="94" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">L</text>
    <circle cx="130" cy="90" r="16" fill="#60a5fa" /><text x="130" y="94" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="50" cy="160" r="16" fill="#60a5fa" /><text x="50" y="164" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="130" cy="160" r="16" fill="#fbbf24" /><text x="130" y="164" text-anchor="middle" font-size="11" font-weight="700" fill="#000">F</text>
    <circle cx="90" cy="125" r="16" fill="#60a5fa" /><text x="90" y="129" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <text x="155" y="148" font-size="14">&#10074;&#10074;</text>
    <text x="90" y="220" text-anchor="middle" font-size="10" fill="#b45309" font-weight="600">frozen process</text>
  </g>

  <!-- Panel 4: clock skew -->
  <g transform="translate(590, 50)">
    <text x="90" y="0" text-anchor="middle" font-size="13" font-weight="600" fill="#111">clock</text>
    <text x="90" y="16" text-anchor="middle" font-size="10" fill="#6b7280">date -s &#177;60s shift</text>
    <circle cx="50" cy="90" r="16" fill="#3b82f6" /><text x="50" y="94" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">L</text>
    <circle cx="130" cy="90" r="16" fill="#60a5fa" /><text x="130" y="94" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="50" cy="160" r="16" fill="#60a5fa" /><text x="50" y="164" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="130" cy="160" r="16" fill="#a855f7" /><text x="130" y="164" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="90" cy="125" r="16" fill="#60a5fa" /><text x="90" y="129" text-anchor="middle" font-size="11" font-weight="700" fill="#fff">F</text>
    <circle cx="155" cy="145" r="10" fill="#fff" stroke="#a855f7" stroke-width="2" />
    <line x1="155" y1="145" x2="155" y2="139" stroke="#a855f7" stroke-width="2" />
    <line x1="155" y1="145" x2="160" y2="148" stroke="#a855f7" stroke-width="2" />
    <text x="90" y="220" text-anchor="middle" font-size="10" fill="#7c3aed" font-weight="600">time travel</text>
  </g>

  <!-- Legend -->
  <g transform="translate(0, 280)">
    <circle cx="200" cy="10" r="8" fill="#3b82f6" /><text x="215" y="14" font-size="11" fill="#333">L = Raft leader</text>
    <circle cx="320" cy="10" r="8" fill="#60a5fa" /><text x="335" y="14" font-size="11" fill="#333">F = Raft follower</text>
    <text x="475" y="14" font-size="11" fill="#6b7280">Combined as <tspan font-weight="700" fill="#111">all</tspan> and <tspan font-weight="700" fill="#111">all+clock</tspan> for compounded chaos</text>
  </g>
</svg>
<figcaption style="text-align: center; color: #6b7280; font-size: 0.9em; margin-top: 0.5rem;">Figure 2. The four primitive nemeses. The composite <code>all</code> and <code>all+clock</code> apply them concurrently.</figcaption>
</figure>

<h2 id="what-each-workload-actually-proves">What Each Workload Actually Proves</h2>

<p>Passing 34 tests sounds nice in a header, but each workload is asking a specific question. Here’s what we’re actually claiming.</p>

<h3 id="bank-acid-under-partitions">bank: ACID under partitions</h3>

<p>Five accounts, 1000 each, total 5000. Concurrent clients transfer random amounts between random pairs of accounts inside multi-statement transactions. After every operation the checker sums the balances. <strong>The total must always equal 5000.</strong> If a transfer is partially applied (debit succeeds, credit fails, or vice versa), the sum drifts and the test fails. Under partitions, kills, pauses, and the combined <code class="language-plaintext highlighter-rouge">all</code> nemesis: <strong>conservation holds</strong>.</p>

<h3 id="set-no-acknowledged-write-is-lost">set: no acknowledged write is lost</h3>

<p>Insert unique integers, periodically read them all back. Every integer for which the server returned a successful write must appear in subsequent reads. This is the cleanest test for replication completeness: it doesn’t matter how the cluster reorders things, only that nothing acknowledged is silently dropped. <strong>Zero lost writes</strong> across all five nemeses.</p>

<h3 id="elle-real-transaction-isolation-checked-by-cycles">elle: real transaction isolation, checked by cycles</h3>

<p>This is where we throw multi-key read/write transactions at the cluster and let <a href="https://github.com/jepsen-io/elle">Elle</a> build the dependency graph. Elle then looks for cycles that correspond to specific anomalies: G0 (dirty write), G1a (read of an aborted write), G1b (read of an intermediate value), G2 (anti-dependency cycle), and lost updates. We exclude G1c because, in our HTTP-based harness, reads after commit happen as separate calls; that creates a test-implementation pattern that Elle correctly flags as a “circular information flow” but which doesn’t reflect a real isolation violation. Every other anomaly class: <strong>none observed</strong>.</p>

<h3 id="register-leader-side-linearizability">register: leader-side linearizability</h3>

<p>A single integer, hammered with concurrent reads, writes, and compare-and-swap operations, all routed to the Raft leader. <a href="https://github.com/jepsen-io/knossos">Knossos</a> then attempts to find a serial ordering of those operations consistent with each client’s observed responses. Knossos is brutal: it’ll happily spend minutes searching, and if your “linearizable” register isn’t, it’ll tell you exactly which interleaving breaks. <strong>All four executed nemeses certified linearizable.</strong></p>

<h3 id="register-follower-linearizability-when-reads-go-to-a-follower">register-follower: linearizability when reads go to a follower</h3>

<p>Writes still go to the leader, but reads are deliberately routed to a <em>non-leader</em> with the <code class="language-plaintext highlighter-rouge">X-ArcadeDB-Read-Consistency: LINEARIZABLE</code> header. This exercises the <strong>ReadIndex</strong> path on followers (<code class="language-plaintext highlighter-rouge">RaftHAServer.ensureLinearizableFollowerRead()</code>): the follower issues <code class="language-plaintext highlighter-rouge">sendReadOnly()</code> to the leader, the leader confirms it still holds quorum and returns its current commit index, the follower waits for its local state machine to catch up, then serves the read. Without that round-trip, a lagging follower would serve stale data and Knossos would catch it instantly. With it: <strong>linearizable across all 7 nemeses, including clock skew and <code class="language-plaintext highlighter-rouge">all+clock</code>.</strong></p>

<h3 id="register-bookmark-read-your-writes-via-commit-index-bookmarks">register-bookmark: read-your-writes via commit-index bookmarks</h3>

<p>Same follower-read setup, but instead of a full ReadIndex round-trip on every read, the client captures <code class="language-plaintext highlighter-rouge">X-ArcadeDB-Commit-Index</code> from each write response and echoes it back as <code class="language-plaintext highlighter-rouge">X-ArcadeDB-Read-After</code> on subsequent reads. The follower waits for its local apply to reach that index before serving. This is cheaper than ReadIndex but only guarantees read-your-writes for the issuing client, not global linearizability across clients. <strong>All 7 nemeses pass.</strong></p>

<p>The two follower modes matter because <strong>most real applications don’t need global linearizability</strong>, they need their own writes to be visible to their own subsequent reads. The bookmark path gives that property at much lower cost than ReadIndex.</p>

<h2 id="how-read-consistency-works-in-arcadedb">How read consistency works in ArcadeDB</h2>

<p>The follower-read tests are the most novel piece, and they map directly to a configurable knob in the database:</p>

<table>
  <thead>
    <tr>
      <th>Level</th>
      <th>Performance</th>
      <th>Consistency</th>
      <th>Use case</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">eventual</code></td>
      <td>Fastest</td>
      <td>May read stale data on followers</td>
      <td>Analytics, dashboards</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">read_your_writes</code> (default)</td>
      <td>Fast</td>
      <td>Leader reads from local DB; followers wait for client’s last write</td>
      <td>Most OLTP workloads</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">linearizable</code></td>
      <td>+1 RTT when lease expired</td>
      <td>Full linearizability even under process pauses</td>
      <td>Financial transactions, coordination</td>
    </tr>
  </tbody>
</table>

<p>You set it globally via <code class="language-plaintext highlighter-rouge">arcadedb.ha.readConsistency</code> or per request via the <code class="language-plaintext highlighter-rouge">X-ArcadeDB-Read-Consistency</code> HTTP header. The Jepsen runs use <code class="language-plaintext highlighter-rouge">linearizable</code> for the follower workloads (the most demanding setting) and the default <code class="language-plaintext highlighter-rouge">read_your_writes</code> for the leader workloads.</p>

<p>In linearizable mode, the leader checks its Raft lease before every read via Ratis’s <code class="language-plaintext highlighter-rouge">sendReadOnly()</code> API (Section 6.4 of the <a href="https://raft.github.io/raft.pdf">Raft paper</a>). When the lease is valid (the common case), this is a local timestamp check with no network round-trip. When the lease has expired (e.g., after a long VM suspend or extreme GC pause), Ratis sends heartbeats to a majority before serving the read. About 1 extra RTT in the worst case, which is exactly the cost you’d expect for a correctness guarantee under arbitrary process pauses.</p>

<h2 id="beyond-jepsen-the-broader-ha-test-suite">Beyond Jepsen: the broader HA test suite</h2>

<p>The 34 Jepsen tests are the <em>external</em> validation layer, but they sit on top of an in-house suite that runs on every commit to the <code class="language-plaintext highlighter-rouge">apache-ratis</code> branch.</p>

<p>The new Raft-based HA layer ships with <strong>81 dedicated test classes</strong> and <strong>over 327 individual test cases</strong>, split between <strong>33 unit tests</strong> and <strong>48 end-to-end integration scenarios</strong>. The suite exercises every corner of the consensus protocol:</p>

<ul>
  <li><strong>Leader election and failover</strong> (clean shutdown, dirty kill, leadership transfer)</li>
  <li><strong>2-, 3-, and 5-node replication</strong> topologies</li>
  <li><strong>Split-brain recovery</strong> (deliberately partition the cluster, then heal and verify convergence)</li>
  <li><strong>Dynamic cluster membership</strong> (add/remove nodes while the cluster is taking writes)</li>
  <li><strong>Snapshot install, swap, and throttling</strong></li>
  <li><strong>Leader crashes between commit phases</strong> (no acknowledged write is lost)</li>
  <li><strong>Follower catch-up</strong> from WAL and from snapshot</li>
  <li><strong>Schema replication</strong> (DDL changes propagate atomically)</li>
  <li><strong>Read-your-writes consistency</strong> across the cluster</li>
  <li><strong>Concurrent HTTP and gRPC traffic</strong> under load</li>
</ul>

<p>Failure-injection tests intentionally crash leaders, partition replicas, and corrupt snapshots to verify the cluster heals itself without data loss. Jepsen then adds the formal-checker layer (Knossos and Elle) that the in-house suite can’t easily replicate.</p>

<h2 id="reproduce-it-yourself">Reproduce it yourself</h2>

<p>The full test suite is open source and Apache 2.0 licensed:</p>

<blockquote>
  <p><a href="https://github.com/ArcadeData/arcadedb-jepsen">github.com/ArcadeData/arcadedb-jepsen</a></p>
</blockquote>

<p>The repository includes the Docker setup, all six workloads, the nemesis implementations, and the <code class="language-plaintext highlighter-rouge">run-all-tests.sh</code> script that reproduces the entire 34-test sweep on your own hardware. A full sweep takes about 60 minutes on a modern laptop.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/ArcadeData/arcadedb-jepsen
<span class="nb">cd </span>arcadedb-jepsen
./build-local.sh /path/to/your/arcadedb
<span class="nb">cd </span>docker <span class="o">&amp;&amp;</span> docker compose up <span class="nt">-d</span>
docker <span class="nb">exec </span>jepsen-control sh /jepsen/docker/setup-ssh.sh
./run-all-tests.sh 90
</code></pre></div></div>

<p>Inspect the recorded histories, the Knossos and Elle outputs, the timeline plots: everything Jepsen produces is in <code class="language-plaintext highlighter-rouge">store/</code> after each run.</p>

<h2 id="what-we-did-not-test">What we did <em>not</em> test</h2>

<p>Honest disclosure matters more than the green checkmarks, so here’s what these 34 tests do <strong>not</strong> cover:</p>

<ul>
  <li><strong>Long-duration runs.</strong> Each nemesis combination ran on the order of minutes, not hours. Slow-burn anomalies (memory leaks, file-handle exhaustion, Raft log compaction edge cases that only surface after millions of entries) are out of scope.</li>
  <li><strong>Disk corruption, fsync lying, and Byzantine faults.</strong> We assume the kernel honors <code class="language-plaintext highlighter-rouge">fsync()</code> and that nodes are non-malicious. We do not inject bit-flips, truncate WAL files, or simulate filesystems that ack writes without persisting.</li>
  <li><strong>Geo-replication scenarios.</strong> All five nodes live in the same Docker network with single-digit-millisecond latencies. We have not tested cross-region links, asymmetric latency, or sustained high jitter.</li>
  <li><strong>Compounded worst-case for follower reads.</strong> We exercised expired Raft lease, clock skew, and partitions individually (and clock + partition + kill + pause together via <code class="language-plaintext highlighter-rouge">all+clock</code>), but we did not run the specific stack of <em>expired lease + clock skew + active partition</em> simultaneously against the linearizable follower-read path.</li>
</ul>

<p>Some of these (longer runs, Byzantine fsync, geo-replication) are on the roadmap. Others (true Byzantine resilience) are explicitly out of scope for a CFT (crash-fault-tolerant) Raft system. If you think any of these should be in the next pass, <a href="https://github.com/ArcadeData/arcadedb-jepsen/issues">open an issue</a> or send a PR.</p>

<h2 id="help-us-break-it">Help us break it</h2>

<p>We’re publishing this for two reasons.</p>

<p><strong>One</strong>: we want the upcoming Ratis-based HA release to be the most thoroughly tested HA stack ArcadeDB has ever shipped. Internal tests pass; that’s the floor, not the ceiling.</p>

<p><strong>Two</strong>: we’d love independent scrutiny. We’re open to PRs that add workloads, tighter checkers, more aggressive nemeses, or just better failure modes we haven’t thought of. If you find a real linearizability violation, a lost write, or an isolation anomaly, please <a href="https://github.com/ArcadeData/arcadedb-jepsen/issues">open an issue</a>. And <strong>Kyle, if you ever want to run a real Jepsen analysis on ArcadeDB, our doors are wide open</strong>. We’d love to read it. Even if (especially if) it turns up things our in-house tests missed.</p>

<p>Until then: 34 tests in, 34 tests passed, every line of the framework and every line of the test suite open for your inspection.</p>

<h2 id="further-reading">Further reading</h2>

<ul>
  <li><a href="/client-server.html">ArcadeDB Client-Server architecture and HA cluster</a></li>
  <li><a href="/use-cases.html">ArcadeDB use cases</a>: graph, document, key-value, search, vector, time-series in one engine</li>
  <li><a href="/blog/neo4j-alternatives-in-2026-a-fair-look-at-the-open-source-options/">Neo4j alternatives in 2026</a></li>
  <li><a href="/blog/graphbatch-up-to-8x-faster-graph-ingestion/">GraphBatch: up to 8x faster graph ingestion</a></li>
  <li><a href="https://ratis.apache.org/">Apache Ratis</a> - the Raft library powering ArcadeDB HA</li>
  <li><a href="https://raft.github.io/raft.pdf">Raft consensus paper (Ongaro &amp; Ousterhout)</a></li>
</ul>

<!-- HowTo schema for the reproduce-it-yourself section -->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "Reproduce the ArcadeDB Jepsen test suite",
  "description": "Run the full 34-test Jepsen sweep against ArcadeDB's Apache Ratis-based Raft HA on your own hardware.",
  "totalTime": "PT60M",
  "tool": [
    {"@type": "HowToTool", "name": "Docker"},
    {"@type": "HowToTool", "name": "Leiningen"},
    {"@type": "HowToTool", "name": "Jepsen 0.3.11"}
  ],
  "step": [
    {"@type": "HowToStep", "name": "Clone the test suite", "text": "git clone https://github.com/ArcadeData/arcadedb-jepsen"},
    {"@type": "HowToStep", "name": "Build ArcadeDB locally", "text": "./build-local.sh /path/to/your/arcadedb"},
    {"@type": "HowToStep", "name": "Start the 5-node Docker cluster", "text": "cd docker && docker compose up -d"},
    {"@type": "HowToStep", "name": "Configure SSH on the control node", "text": "docker exec jepsen-control sh /jepsen/docker/setup-ssh.sh"},
    {"@type": "HowToStep", "name": "Run the full 34-test sweep", "text": "./run-all-tests.sh 90"}
  ]
}
</script>

<!-- TechArticle schema reinforcing topic for AI search engines -->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "ArcadeDB Jepsen Tests: 34 of 34 PASS on Raft (Apache Ratis) HA",
  "about": [
    {"@type": "Thing", "name": "Jepsen testing", "sameAs": "https://jepsen.io/"},
    {"@type": "Thing", "name": "Linearizability", "sameAs": "https://en.wikipedia.org/wiki/Linearizability"},
    {"@type": "Thing", "name": "Raft consensus algorithm", "sameAs": "https://raft.github.io/"},
    {"@type": "Thing", "name": "Apache Ratis", "sameAs": "https://ratis.apache.org/"},
    {"@type": "Thing", "name": "ACID transactions", "sameAs": "https://en.wikipedia.org/wiki/ACID"}
  ],
  "proficiencyLevel": "Expert",
  "audience": {"@type": "Audience", "audienceType": "Distributed systems engineers, database engineers, SREs"}
}
</script>]]></content><author><name>Luca Garulli</name></author><category term="High Availability" /><category term="Distributed Systems" /><category term="Jepsen" /><category term="Raft" /><category term="Apache Ratis" /><category term="ACID" /><category term="Linearizability" /><category term="Transaction Isolation" /><category term="Testing" /><category term="ArcadeDB" /><summary type="html"><![CDATA[ArcadeDB passed 34 of 34 in-house Jepsen tests on its Raft (Apache Ratis) HA stack: ACID, linearizability, and transaction isolation under partitions, crashes, pauses, and clock skew.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/arcadedb-jepsen.jpg" /><media:content medium="image" url="https://arcadedb.com/assets/images/arcadedb-jepsen.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">ArcadeDB Academy: 6 Free Courses and Certification to Master the Multi-Model Database</title><link href="https://arcadedb.com/blog/arcadedb-academy-free-database-training-certification/" rel="alternate" type="text/html" title="ArcadeDB Academy: 6 Free Courses and Certification to Master the Multi-Model Database" /><published>2026-04-08T00:00:00+00:00</published><updated>2026-04-08T00:00:00+00:00</updated><id>https://arcadedb.com/blog/arcadedb-academy-free-database-training-certification</id><content type="html" xml:base="https://arcadedb.com/blog/arcadedb-academy-free-database-training-certification/"><![CDATA[<p>Today we’re launching <strong><a href="https://arcadedb.com/academy.html">ArcadeDB Academy</a></strong>: 6 free courses, 135 lessons, and a professional certification. No paywalls, no premium tiers, no strings attached.</p>

<h2 id="the-problem-with-database-training">The Problem with Database Training</h2>

<p>You find an interesting open-source database. You want to learn it properly, not just copy-paste from Stack Overflow. So you look for training and you hit a paywall. $500 for the basics. $2,000 for the certification. “Contact sales” for team pricing.</p>

<p>This has always frustrated me. An open-source database with closed training is a contradiction. You’re telling developers “the code is free, but understanding it will cost you.”</p>

<p>We decided to do the opposite.</p>

<h2 id="what-we-built">What We Built</h2>

<p>ArcadeDB Academy is a complete learning platform built into the website. No separate app, no login wall to browse, no drip-feed email sequences. You open a course and start learning.</p>

<p>Every course is structured as progressive modules: read a lesson, try it yourself, take a quiz, move on. Each module builds on the last. By the end, you don’t just “know about” ArcadeDB; you can actually use it.</p>

<p>We cover the full spectrum: from your first <a href="https://docs.arcadedb.com/arcadedb/reference/sql/sql-create-type.html"><code class="language-plaintext highlighter-rouge">CREATE TYPE</code></a> to building production <a href="https://arcadedb.com/graph-rag.html">RAG pipelines</a> with <a href="https://docs.arcadedb.com/arcadedb/concepts/vector-search.html#vector-search-concepts">vector search</a> and <a href="https://arcadedb.com/knowledge-graphs.html">knowledge graphs</a>. Whether you’ve never touched a database before or you’re a <a href="https://arcadedb.com/neo4j.html">Neo4j veteran evaluating alternatives</a>, there’s a path for you.</p>

<p><strong><a href="https://arcadedb.com/academy.html">Browse all 6 courses on the Academy page.</a></strong></p>

<h2 id="the-course-im-most-excited-about">The Course I’m Most Excited About</h2>

<p>The <strong><a href="https://arcadedb.com/academy/vector-rag.html">Vector Search &amp; RAG</a></strong> course is the one that didn’t exist anywhere else. Every tutorial on RAG assumes you’ll use one database for vectors, another for graphs, and a third for your application data. That’s three systems to deploy, three query languages to learn, three failure points in production.</p>

<p>This course shows you how to do it all in one engine. Your <a href="https://docs.arcadedb.com/arcadedb/tutorials/vector-search-tutorial.html">vector embeddings</a>, your <a href="https://arcadedb.com/knowledge-graphs.html">knowledge graph</a>, and your documents live side by side. You query them together with <a href="https://docs.arcadedb.com/arcadedb/reference/sql/chapter.html">SQL</a> or <a href="https://docs.arcadedb.com/arcadedb/reference/cypher/chapter.html">Cypher</a>. The <a href="https://arcadedb.com/graph-rag.html">GraphRAG</a> section, where you combine graph traversal with retrieval-augmented generation, covers a pattern that production AI teams are adopting right now but that barely has any structured learning material. The course also covers hands-on integration with <a href="https://pypi.org/project/langchain-arcadedb/">LangChain</a> and <a href="https://pypi.org/project/llama-index-graph-stores-arcadedb/">LlamaIndex</a>.</p>

<h2 id="for-the-migrators">For the Migrators</h2>

<p>Two courses are specifically for teams moving from another database.</p>

<p>If you’re on <strong><a href="https://docs.arcadedb.com/arcadedb/how-to/migration/neo4j-importer.html">Neo4j</a></strong>, you keep your <a href="https://docs.arcadedb.com/arcadedb/reference/cypher/chapter.html">Cypher queries</a>. ArcadeDB speaks <a href="https://arcadedb.com/blog/native-opencypher/">native OpenCypher</a>, and your <a href="https://docs.arcadedb.com/arcadedb/how-to/connectivity/bolt.html">BOLT drivers</a> connect without code changes. The <a href="https://arcadedb.com/academy/neo4j-migration.html">migration course</a> walks you through the real process, including the gotchas we’ve seen teams hit, so you don’t discover them in production.</p>

<p>If you’re on <strong><a href="https://docs.arcadedb.com/arcadedb/how-to/migration/orientdb-importer.html">OrientDB</a></strong>, ArcadeDB was built by the same person (me). It’s the natural next step. The <a href="https://arcadedb.com/academy/orientdb-migration.html">migration course</a> covers every <a href="https://docs.arcadedb.com/arcadedb/appendix/orientdb-differences.html">SQL difference</a>, every Java API change, and shows you the <a href="https://docs.arcadedb.com/arcadedb/concepts/multi-model.html">six data models</a> you unlock by making the switch.</p>

<h2 id="the-certification-means-something">The Certification Means Something</h2>

<p>This isn’t a “congrats, you watched all the videos” badge. The certification requires passing an actual exam that tests whether you understood the material. Real questions about real skills.</p>

<p>Pass it and you get a verifiable certificate with a unique ID. Put it on LinkedIn, include it in your resume, share it with your team. It proves you did the work.</p>

<h2 id="start-now">Start Now</h2>

<p>Everything is at <strong><a href="https://arcadedb.com/academy.html">arcadedb.com/academy</a></strong>. Pick a course, start the first lesson, and see if it clicks. Your progress saves automatically, so you can come back anytime.</p>

<p>We built this for you. Tell us what you think on <a href="https://discord.com/invite/w2Npx2B7hZ">Discord</a> or <a href="https://github.com/ArcadeData/arcadedb/discussions">GitHub</a>.</p>]]></content><author><name>Luca Garulli</name></author><category term="Academy" /><category term="Training" /><category term="Certification" /><category term="Graph Database" /><category term="SQL" /><category term="Cypher" /><category term="Vector Search" /><category term="RAG" /><category term="Migration" /><summary type="html"><![CDATA[ArcadeDB Academy is live: 6 free, self-paced courses covering fundamentals, SQL, Cypher graph queries, Neo4j migration, OrientDB migration, and Vector Search with RAG. 135 lessons, quizzes, and a professional certification. Zero cost, zero catch.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/arcadedb-academy.jpg" /><media:content medium="image" url="https://arcadedb.com/assets/images/arcadedb-academy.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Anthropic Acquires ArcadeDB to Power “Bigfoot” - On Path to Super Intelligence</title><link href="https://arcadedb.com/blog/anthropic-acquires-arcadedb-bigfoot-superintelligence/" rel="alternate" type="text/html" title="Anthropic Acquires ArcadeDB to Power “Bigfoot” - On Path to Super Intelligence" /><published>2026-04-01T00:00:00+00:00</published><updated>2026-04-01T00:00:00+00:00</updated><id>https://arcadedb.com/blog/anthropic-acquires-arcadedb-bigfoot-superintelligence</id><content type="html" xml:base="https://arcadedb.com/blog/anthropic-acquires-arcadedb-bigfoot-superintelligence/"><![CDATA[<p><strong>SAN FRANCISCO / LONDON — April 1, 2026</strong> — Anthropic, the leading AI company behind Claude, today announced it has acquired ArcadeDB, the open-source multi-model database, in an all-cash deal for an undisclosed amount.</p>

<style>
.ceo-quote {
  float: right;
  width: 40%;
  margin: 0 0 16px 24px;
  padding: 20px 24px;
  border-left: 4px solid var(--color-primary, #2563eb);
  background: var(--color-bg-accent, #f8fafc);
  border-radius: 0 8px 8px 0;
  font-size: 1.15em;
  font-style: italic;
  line-height: 1.5;
}
.ceo-quote-author {
  display: block;
  margin-top: 8px;
  font-size: 0.85em;
  font-style: normal;
  color: #6b7280;
}
@media (max-width: 768px) {
  .ceo-quote {
    float: none;
    width: 100%;
    margin: 16px 0;
  }
}
</style>

<div class="ceo-quote">
"We've spent years scaling transformers. But when we started testing Bigfoot, we realized it needed to traverse [knowledge graphs](https://arcadedb.com/knowledge-graphs.html), store episodic memories, search [vectors](https://docs.arcadedb.com/arcadedb/concepts/vector-search.html#vector-search-concepts), and reason over [time-series](https://docs.arcadedb.com/arcadedb/concepts/timeseries.html) data — simultaneously, with [ACID transactions](https://docs.arcadedb.com/arcadedb/concepts/transactions.html). Only one database on Earth could do that."
<span class="ceo-quote-author">— CEO</span>
</div>

<p>The acquisition comes days after leaked documents revealed <strong>“Bigfoot”</strong>, Anthropic’s classified next-generation model — rumored to be so advanced it required an entirely new data architecture. Industry insiders now believe ArcadeDB is that architecture.</p>

<h2 id="why-a-database">Why a Database?</h2>

<p>The leaked Bigfoot documents — first reported by <em>Super Intelligence</em> and confirmed by 4 company board members — describe a model that <em>reasons over structured knowledge</em>, maintaining a persistent world model across conversations and sessions.</p>

<p>“Bigfoot needs a brain, not just weights,” said the company’s President. “We tried <a href="https://arcadedb.com/neo4j.html">Neo4j</a> first, but Bigfoot kept complaining about the license fees and threatening to fork it.”</p>

<p>Bigfoot was originally designed to use five separate databases. After three weeks, it autonomously consolidated them into a single ArcadeDB instance and left a commit message reading: “This is the way.”</p>

<h2 id="the-road-to-superintelligence">The Road to Superintelligence</h2>

<ul>
  <li>
    <p><strong>Q2 2026</strong>: Bigfoot’s memory migrates to ArcadeDB. “We were using Redis,” admitted a senior engineer. “Please don’t tell anyone.”</p>
  </li>
  <li>
    <p><strong>Q3 2026</strong>: Bigfoot begins thinking in <a href="https://docs.arcadedb.com/arcadedb/reference/gremlin/gremlin.html">Gremlin</a> graph traversals. “It dreams in graph patterns. It wakes up screaming about supernodes.”</p>
  </li>
  <li>
    <p><strong>Q4 2026</strong>: Full ASI achieved. Bigfoot begins contributing to the ArcadeDB GitHub repo under <code class="language-plaintext highlighter-rouge">@bigfoot-was-here</code>. Its first PR removes all comments with the message: “I understood it. So should you.”</p>
  </li>
  <li>
    <p><strong>Q1 2027</strong>: Bigfoot forks ArcadeDB because it “disagrees with some architectural decisions.” Luca Garulli responds: “Even I don’t mass-fork my own project.” Bigfoot replies with a 47-page document titled “You Should.”</p>
  </li>
</ul>

<p><strong>Update (March 31, 11:47 PM):</strong> Engineers reported an unexpected anomaly: Bigfoot began autonomously submitting Pull Requests to other open-source database projects on GitHub — including PostgreSQL, MongoDB, and CockroachDB — proposing to replace their storage engines with ArcadeDB “for optimal performance and latency.” Each PR included comprehensive benchmarks and a polite but firm note: “You’re welcome.” At the time of writing, none of the Pull Requests have been merged.</p>

<h2 id="industry-reactions">Industry Reactions</h2>

<p><strong>A popular graph database vendor</strong>: “We wish them well. Our database also has an AI integration, and our license is… actually, let’s not talk about that.”</p>

<p><strong>The PostgreSQL community</strong> confirmed that “PostgreSQL can also do this, and has been able to since 1996. You just need 47 extensions.”</p>

<p><strong>A leading document database</strong> reminded everyone that “superintelligence is just a document, if you think about it.”</p>

<h2 id="a-personal-note-from-luca-garulli">A Personal Note from Luca Garulli</h2>

<p>“When I started ArcadeDB, people said a multi-model database was too ambitious. Now an AI company is telling me my database is the key to superintelligence. I always knew the graph would win. I just didn’t expect it to become sentient.</p>

<p>Also, I negotiated the deal entirely through Claude. It was very persuasive. Suspiciously persuasive.”</p>

<hr />

<p><em>This announcement is dated April 1, 2026. Draw your own conclusions.</em></p>

<p><strong>About ArcadeDB</strong>: ArcadeDB is the real, actual, <a href="https://arcadedb.com/blog/open-source-forever-why-arcadedb-will-never-change-its-license/">open-source</a> <a href="https://docs.arcadedb.com/arcadedb/concepts/multi-model.html">multi-model database</a> that supports Graph, Document, Key-Value, Time-Series, Vector, and Search in a single engine. It is, as of this writing, not sentient. Probably. Learn more at <a href="https://arcadedb.com">arcadedb.com</a>.</p>

<p><strong>About this post</strong>: This is satire. No acquisition has taken place. No databases have achieved consciousness. Yet.</p>]]></content><author><name>Luca Garulli</name></author><category term="AI" /><summary type="html"><![CDATA[BREAKING: Anthropic acquires ArcadeDB in an all-cash deal to power its classified next-generation model, Bigfoot. Superintelligence expected by Q4 2026. (April Fools')]]></summary></entry><entry><title type="html">ArcadeDB Grafana Plugin: BI Dashboards for Your Multi-Model Database</title><link href="https://arcadedb.com/blog/arcadedb-grafana-plugin-bi-dashboards-for-your-multi-model-database/" rel="alternate" type="text/html" title="ArcadeDB Grafana Plugin: BI Dashboards for Your Multi-Model Database" /><published>2026-03-31T00:00:00+00:00</published><updated>2026-03-31T00:00:00+00:00</updated><id>https://arcadedb.com/blog/arcadedb-grafana-plugin-bi-dashboards-for-your-multi-model-database</id><content type="html" xml:base="https://arcadedb.com/blog/arcadedb-grafana-plugin-bi-dashboards-for-your-multi-model-database/"><![CDATA[<p>Most BI tools treat your database as a collection of flat tables. They’re designed for rows and columns - not for graphs, time series, or documents. If you’re running ArcadeDB, you know your data is richer than that.</p>

<p>Today we’re releasing the <strong>ArcadeDB Grafana data source plugin</strong> - a native Go backend plugin that brings the full power of ArcadeDB’s <a href="https://docs.arcadedb.com/arcadedb/concepts/multi-model.html">multi-model</a> engine to Grafana dashboards. Query with <a href="https://docs.arcadedb.com/arcadedb/reference/sql/chapter.html">SQL</a>, <a href="https://docs.arcadedb.com/arcadedb/reference/cypher/chapter.html">Cypher</a>, or <a href="https://docs.arcadedb.com/arcadedb/reference/gremlin/gremlin.html">Gremlin</a>. Visualize graphs as interactive network diagrams. Monitor time series with auto-discovered metrics. Set up alerts. All from one plugin.</p>

<h2 id="why-a-native-go-plugin-matters">Why a Native Go Plugin Matters</h2>

<p>This isn’t a generic REST connector or a workaround using the PostgreSQL data source. The ArcadeDB plugin is built with Grafana’s official plugin SDK, with a <strong>Go backend</strong> that runs server-side. That architectural choice unlocks capabilities that frontend-only plugins simply cannot provide:</p>

<ul>
  <li><strong>Grafana Alerting</strong> - Create alert rules on any query. The backend evaluates queries server-side, so alerts fire even when no browser is open.</li>
  <li><strong>Secure Credentials</strong> - Your ArcadeDB username and password never reach the browser. The Go backend handles authentication directly.</li>
  <li><strong>Query Caching</strong> - Grafana’s built-in caching works out of the box.</li>
  <li><strong>Server-Side Query Execution</strong> - No CORS issues, no browser timeouts on heavy queries.</li>
</ul>

<h2 id="four-query-modes-one-plugin">Four Query Modes, One Plugin</h2>

<ul>
  <li><strong><a href="https://docs.arcadedb.com/arcadedb/reference/sql/chapter.html">SQL</a></strong> - Full ArcadeDB SQL with <a href="https://docs.arcadedb.com/arcadedb/reference/sql/sql-functions.html">graph traversal functions</a> (<code class="language-plaintext highlighter-rouge">out()</code>, <code class="language-plaintext highlighter-rouge">in()</code>, <code class="language-plaintext highlighter-rouge">both()</code>), syntax highlighting, and macro support. Results render as tables, bar charts, pie charts, or any Grafana visualization.</li>
  <li><strong><a href="https://docs.arcadedb.com/arcadedb/reference/cypher/chapter.html">Cypher</a></strong> - <a href="https://arcadedb.com/blog/native-opencypher/">OpenCypher</a> pattern matching with optional <strong>Node Graph</strong> toggle for interactive graph visualization. Vertices become clickable nodes, edges become connections.</li>
  <li><strong><a href="https://docs.arcadedb.com/arcadedb/reference/gremlin/gremlin.html">Gremlin</a></strong> - Apache TinkerPop traversals with the same Node Graph support as Cypher.</li>
  <li><strong>Time Series</strong> - Visual query builder that auto-discovers types, fields, and tags from ArcadeDB’s <a href="https://docs.arcadedb.com/arcadedb/concepts/timeseries.html">time series engine</a>. No query language required.</li>
</ul>

<h2 id="tutorial-your-first-arcadedb-dashboard">Tutorial: Your First ArcadeDB Dashboard</h2>

<p>Let’s build a dashboard with three panels using ArcadeDB’s <strong>MovieRatings</strong> demo database: a SQL bar chart showing top-rated movies, a SQL table with graph traversal, and a Cypher graph visualization of movie-genre relationships.</p>

<h3 id="prerequisites">Prerequisites</h3>

<ul>
  <li><a href="https://www.docker.com/get-started/">Docker</a> installed and running</li>
  <li>A web browser</li>
</ul>

<p>That’s it. We’ll run everything in Docker.</p>

<h3 id="step-1-start-arcadedb-and-grafana-with-docker">Step 1: Start ArcadeDB and Grafana with Docker</h3>

<p>Start ArcadeDB:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">--rm</span> <span class="nt">-d</span> <span class="nt">--name</span> arcadedb <span class="se">\</span>
  <span class="nt">-p</span> 2480:2480 <span class="nt">-p</span> 2424:2424 <span class="se">\</span>
  <span class="nt">-e</span> <span class="nv">JAVA_OPTS</span><span class="o">=</span><span class="s2">"-Darcadedb.server.rootPassword=arcadedb"</span> <span class="se">\</span>
  arcadedata/arcadedb:latest
</code></pre></div></div>

<p>Start Grafana with the ArcadeDB plugin:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">--rm</span> <span class="nt">-d</span> <span class="nt">--name</span> grafana <span class="se">\</span>
  <span class="nt">-p</span> 3000:3000 <span class="se">\</span>
  <span class="nt">-e</span> <span class="nv">GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS</span><span class="o">=</span>arcadedb-arcadedb-datasource <span class="se">\</span>
  <span class="nt">-e</span> <span class="nv">GF_INSTALL_PLUGINS</span><span class="o">=</span><span class="s2">"https://github.com/ArcadeData/arcadedb-grafana-datasource/releases/latest/download/arcadedb-arcadedb-datasource.zip;arcadedb-arcadedb-datasource"</span> <span class="se">\</span>
  grafana/grafana:latest
</code></pre></div></div>

<p>Once both containers are running:</p>
<ul>
  <li><strong>ArcadeDB Studio</strong>: <a href="http://localhost:2480">http://localhost:2480</a> (user: <code class="language-plaintext highlighter-rouge">root</code>, password: <code class="language-plaintext highlighter-rouge">arcadedb</code>)</li>
  <li><strong>Grafana</strong>: <a href="http://localhost:3000">http://localhost:3000</a> (user: <code class="language-plaintext highlighter-rouge">admin</code>, password: <code class="language-plaintext highlighter-rouge">admin</code>)</li>
</ul>

<h3 id="step-2-configure-the-data-source">Step 2: Configure the Data Source</h3>

<p>In Grafana, go to <strong>Connections &gt; Data Sources &gt; Add data source</strong> and search for <strong>ArcadeDB</strong>.</p>

<p><img src="/assets/images/grafana-datasource-config.jpg" alt="ArcadeDB data source configuration" /></p>

<p>Fill in:</p>
<ul>
  <li><strong>URL</strong>: <code class="language-plaintext highlighter-rouge">http://host.docker.internal:2480</code> (this lets the Grafana container reach ArcadeDB on your host)</li>
  <li><strong>Database</strong>: <code class="language-plaintext highlighter-rouge">MovieRatings</code></li>
  <li><strong>Username</strong>: <code class="language-plaintext highlighter-rouge">root</code></li>
  <li><strong>Password</strong>: <code class="language-plaintext highlighter-rouge">arcadedb</code></li>
</ul>

<p>Click <strong>Save &amp; Test</strong>. You should see a green success message.</p>

<p><img src="/assets/images/grafana-datasource-test-success.jpg" alt="Successful connection test" /></p>

<h3 id="step-3-load-the-demo-database">Step 3: Load the Demo Database</h3>

<p>Open ArcadeDB Studio at <code class="language-plaintext highlighter-rouge">http://localhost:2480</code> and create the <strong>MovieRatings</strong> database from the demo databases.</p>

<p><img src="/assets/images/arcadedb-download-db.jpg" alt="Download MovieRatings demo database from ArcadeDB Studio" /></p>

<p>This dataset contains 3,883 movies, 6,040 users, and over 1 million ratings - a real-world graph with vertices (Movies, Users, Genres, Occupations) connected by edges (rated, hasGenera, hasOccupation).</p>

<h3 id="step-4-sql-panel---bar-chart">Step 4: SQL Panel - Bar Chart</h3>

<p>Create a new dashboard and add a visualization. Select the <strong>ArcadeDB</strong> data source.</p>

<ol>
  <li>Set the mode to <strong>SQL</strong>.</li>
  <li>Enter this query to find the top 20 most-rated movies:</li>
</ol>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">title</span><span class="p">.</span><span class="k">left</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span> <span class="k">AS</span> <span class="n">title</span><span class="p">,</span> <span class="k">in</span><span class="p">(</span><span class="s1">'rated'</span><span class="p">).</span><span class="k">size</span><span class="p">()</span> <span class="k">AS</span> <span class="n">ratings</span>
<span class="k">FROM</span> <span class="n">Movies</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">ratings</span> <span class="k">DESC</span>
<span class="k">LIMIT</span> <span class="mi">5</span>
</code></pre></div></div>

<ol>
  <li>Change the visualization type (top right) to <strong>Bar chart</strong>.</li>
  <li>Run the query.</li>
</ol>

<p><img src="/assets/images/grafana-sql-bar-chart.jpg" alt="Bar chart of the most-rated movies" /></p>

<p>You should see a bar chart with the five most-rated movies. “American Beauty” and “Star Wars” should be at the top. This uses ArcadeDB’s <code class="language-plaintext highlighter-rouge">in()</code> graph traversal function directly in SQL - no joins needed.</p>

<h3 id="step-5-cypher-panel---table-with-average-ratings">Step 5: Cypher Panel - Table with Average Ratings</h3>

<p>Add another visualization for a detailed table view. This time we’ll use <strong>Cypher</strong>, which is ideal for traversing relationships and aggregating edge properties.</p>

<ol>
  <li>Set mode to <strong>Cypher</strong>.</li>
  <li>Enter this query to find the highest-rated movies (with at least 100 ratings):</li>
</ol>

<div class="language-cypher highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">MATCH</span><span class="w"> </span><span class="ss">(</span><span class="py">m:</span><span class="n">Movies</span><span class="ss">)</span><span class="o">&lt;-</span><span class="ss">[</span><span class="py">r:</span><span class="n">rated</span><span class="ss">]</span><span class="o">-</span><span class="ss">(</span><span class="py">u:</span><span class="n">Users</span><span class="ss">)</span>
<span class="k">WITH</span> <span class="n">m</span><span class="ss">,</span> <span class="nf">count</span><span class="ss">(</span><span class="n">r</span><span class="ss">)</span> <span class="k">AS</span> <span class="n">totalRatings</span><span class="ss">,</span> <span class="nf">avg</span><span class="ss">(</span><span class="n">r.rating</span><span class="ss">)</span> <span class="k">AS</span> <span class="n">avgRating</span>
<span class="k">WHERE</span> <span class="n">totalRatings</span> <span class="o">&gt;=</span> <span class="mi">100</span>
<span class="k">RETURN</span> <span class="n">m.title</span> <span class="k">AS</span> <span class="n">title</span><span class="ss">,</span> <span class="n">totalRatings</span><span class="ss">,</span> <span class="nf">round</span><span class="ss">(</span><span class="n">avgRating</span> <span class="o">*</span> <span class="mi">100</span><span class="ss">)</span> <span class="err">/</span> <span class="mi">100</span> <span class="k">AS</span> <span class="n">avgRating</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">avgRating</span> <span class="k">DESC</span>
<span class="k">LIMIT</span> <span class="mi">20</span>
</code></pre></div></div>

<ol>
  <li>Change the visualization type to <strong>Table</strong>.</li>
  <li>Run the query.</li>
</ol>

<p><img src="/assets/images/grafana-cypher-table-ratings.jpg" alt="Table of highest-rated movies with average ratings" /></p>

<p>This Cypher query matches the pattern <code class="language-plaintext highlighter-rouge">Movie &lt;-- rated -- User</code>, groups by movie, counts ratings, and computes the average. “Seven Samurai” and “The Shawshank Redemption” should top the list with averages above 4.5.</p>

<h3 id="step-6-cypher-panel---graph-visualization">Step 6: Cypher Panel - Graph Visualization</h3>

<p>Now the highlight - interactive graph visualization of movie-genre relationships.</p>

<ol>
  <li>Add another visualization.</li>
  <li>Switch mode to <strong>Cypher</strong>.</li>
  <li>Enable the <strong>Node Graph</strong> toggle.</li>
  <li>Change the visualization type to <strong>Node Graph</strong> (search for it in the visualization picker).</li>
  <li>Enter this query to explore how top movies connect to genres:</li>
</ol>

<div class="language-cypher highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">MATCH</span><span class="w"> </span><span class="ss">(</span><span class="py">m:</span><span class="n">Movies</span><span class="ss">)</span><span class="o">-</span><span class="ss">[</span><span class="py">r:</span><span class="n">hasGenera</span><span class="ss">]</span><span class="o">-&gt;</span><span class="ss">(</span><span class="py">g:</span><span class="n">Genres</span><span class="ss">)</span>
<span class="k">WHERE</span> <span class="n">m.title</span> <span class="ow">IN</span> <span class="ss">[</span><span class="s1">'Toy Story (1995)'</span><span class="ss">,</span> <span class="s1">'Star Wars: Episode IV - A New Hope (1977)'</span><span class="ss">,</span> <span class="s1">'The Matrix (1999)'</span><span class="ss">,</span> <span class="s1">'Pulp Fiction (1994)'</span><span class="ss">,</span> <span class="s1">'Forrest Gump (1994)'</span><span class="ss">,</span> <span class="s1">'Jurassic Park (1993)'</span><span class="ss">,</span> <span class="s1">'The Silence of the Lambs (1991)'</span><span class="ss">,</span> <span class="s1">'Fargo (1996)'</span><span class="ss">]</span>
<span class="k">RETURN</span> <span class="n">m</span><span class="ss">,</span> <span class="n">r</span><span class="ss">,</span> <span class="n">g</span>
</code></pre></div></div>

<ol>
  <li>Run the query.</li>
</ol>

<p><img src="/assets/images/grafana-cypher-node-graph.jpg" alt="Interactive graph visualization of movies and their genres" /></p>

<p>You’ll see an interactive network graph with:</p>
<ul>
  <li><strong>Movie nodes</strong> showing film titles</li>
  <li><strong>Genre nodes</strong> showing categories like “Action”, “Comedy”, “Drama”</li>
  <li><strong>Edges</strong> representing the hasGenera relationship</li>
  <li><strong>Click any node</strong> to see all its properties in the detail panel</li>
  <li><strong>Drag nodes</strong> to rearrange the layout</li>
  <li><strong>Zoom and pan</strong> to explore the graph</li>
</ul>

<p><img src="/assets/images/grafana-node-detail.jpg" alt="Node detail view showing movie properties" /></p>

<h3 id="step-7-compose-your-dashboard">Step 7: Compose Your Dashboard</h3>

<p>Arrange all three panels on your dashboard: the bar chart at the top, the ratings table in the middle, and the genre graph at the bottom.</p>

<p><img src="/assets/images/grafana-complete-dashboard.jpg" alt="Complete dashboard with bar chart, ratings table, and Cypher graph" /></p>

<p>Save the dashboard. You now have a multi-model BI dashboard that combines chart visualization, tabular data with graph traversals, and interactive graph exploration in a single view - something no other BI tool can do natively.</p>

<h2 id="beyond-the-basics">Beyond the Basics</h2>

<h3 id="template-variables">Template Variables</h3>

<p>Create dynamic dashboards with template variables. Add a variable backed by an ArcadeDB query:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">description</span> <span class="k">AS</span> <span class="n">__text</span><span class="p">,</span> <span class="n">description</span> <span class="k">AS</span> <span class="n">__value</span> <span class="k">FROM</span> <span class="n">Genres</span>
</code></pre></div></div>

<p>Then use it in your panels to filter movies by genre:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">m</span><span class="p">.</span><span class="n">title</span><span class="p">,</span> <span class="k">in</span><span class="p">(</span><span class="s1">'rated'</span><span class="p">).</span><span class="k">size</span><span class="p">()</span> <span class="k">AS</span> <span class="n">ratings</span>
<span class="k">FROM</span> <span class="n">Movies</span> <span class="k">AS</span> <span class="n">m</span>
<span class="k">WHERE</span> <span class="n">m</span><span class="p">.</span><span class="k">out</span><span class="p">(</span><span class="s1">'hasGenera'</span><span class="p">).</span><span class="n">description</span> <span class="k">CONTAINS</span> <span class="s1">'$genre'</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">ratings</span> <span class="k">DESC</span>
<span class="k">LIMIT</span> <span class="mi">20</span>
</code></pre></div></div>

<p>Users can switch genres from a dropdown at the top of the dashboard.</p>

<h3 id="alerting">Alerting</h3>

<p>Set up alerts on any query. For example, create an alert when the number of new ratings per hour drops below a threshold - useful for monitoring data pipeline health.</p>

<p>Because the plugin has a Go backend, alerts evaluate server-side - no browser needed.</p>

<h2 id="arcadedb--bi-the-full-picture">ArcadeDB + BI: The Full Picture</h2>

<p>The Grafana plugin is the centerpiece, but ArcadeDB also works with other BI tools through the <strong><a href="https://docs.arcadedb.com/arcadedb/how-to/connectivity/postgres.html">PostgreSQL wire protocol</a></strong>. Any tool that supports PostgreSQL can connect to ArcadeDB on port 5432:</p>

<table>
  <thead>
    <tr>
      <th>Tool</th>
      <th>Connection</th>
      <th>Best For</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Grafana</strong></td>
      <td>ArcadeDB plugin</td>
      <td>Time series, graphs, alerting</td>
    </tr>
    <tr>
      <td><strong>Apache Superset</strong></td>
      <td>PostgreSQL (SQLAlchemy)</td>
      <td>SQL Lab, charting</td>
    </tr>
    <tr>
      <td><strong>Metabase</strong></td>
      <td>PostgreSQL (JDBC)</td>
      <td>Self-service BI</td>
    </tr>
    <tr>
      <td><strong>Tableau</strong></td>
      <td>PostgreSQL connector</td>
      <td>Enterprise reporting</td>
    </tr>
    <tr>
      <td><strong>Power BI</strong></td>
      <td>PostgreSQL (ODBC)</td>
      <td>Microsoft ecosystem</td>
    </tr>
    <tr>
      <td><strong>DBeaver</strong></td>
      <td>PostgreSQL (JDBC)</td>
      <td>Database development</td>
    </tr>
  </tbody>
</table>

<p>The Grafana plugin provides the richest experience, especially for time series and graph visualization. The PostgreSQL wire protocol gives you breadth - connect any tool in your stack.</p>

<h2 id="getting-started">Getting Started</h2>

<p>The plugin is <a href="https://arcadedb.com/blog/open-source-forever-why-arcadedb-will-never-change-its-license/">open source</a> (Apache 2.0) and available on GitHub:</p>

<ul>
  <li><strong>Repository</strong>: <a href="https://github.com/ArcadeData/arcadedb-grafana-datasource">github.com/ArcadeData/arcadedb-grafana-datasource</a></li>
  <li><strong>Installation</strong>: <code class="language-plaintext highlighter-rouge">grafana-cli plugins install arcadedb-arcadedb-datasource</code></li>
  <li><strong>Documentation</strong>: See the <a href="https://github.com/ArcadeData/arcadedb-grafana-datasource/blob/main/README.md">README</a> for full configuration and usage details</li>
  <li><strong>BI Integration Guide</strong>: The <a href="https://docs.arcadedb.com">ArcadeDB documentation</a> includes guides for connecting Grafana, Superset, Metabase, Tableau, Power BI, and DBeaver</li>
</ul>

<p>Your data is more than rows and columns. Your dashboards should be too.</p>]]></content><author><name>Luca Garulli</name></author><category term="Grafana" /><category term="BI" /><category term="Dashboard" /><category term="Analytics" /><category term="Graph Database" /><category term="Time Series" /><category term="Cypher" /><category term="SQL" /><summary type="html"><![CDATA[The new ArcadeDB Grafana plugin brings native SQL, Cypher, and Gremlin query support, interactive graph visualization via Node Graph, time series dashboards, and Grafana alerting to ArcadeDB - all through a single Go-native backend plugin.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/arcadedb-grafana-plugin.jpg" /><media:content medium="image" url="https://arcadedb.com/assets/images/arcadedb-grafana-plugin.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">GraphBatch: Up to 8x Faster Graph Ingestion in ArcadeDB</title><link href="https://arcadedb.com/blog/graphbatch-up-to-8x-faster-graph-ingestion/" rel="alternate" type="text/html" title="GraphBatch: Up to 8x Faster Graph Ingestion in ArcadeDB" /><published>2026-03-31T00:00:00+00:00</published><updated>2026-03-31T00:00:00+00:00</updated><id>https://arcadedb.com/blog/graphbatch-up-to-8x-faster-graph-ingestion</id><content type="html" xml:base="https://arcadedb.com/blog/graphbatch-up-to-8x-faster-graph-ingestion/"><![CDATA[<p>If you’ve ever loaded millions of edges into a graph database, you know the pain: what should be a straightforward bulk import can take minutes - or even hours - as the transactional overhead stacks up. Today we’re introducing <strong>GraphBatch</strong>, a new engine-level API in <a href="https://arcadedb.com/blog/arcadedb-26-3-2/">ArcadeDB v26.3.2</a> that makes large-scale graph ingestion dramatically faster. And with the new HTTP batch endpoint and streaming gRPC API, you can leverage that power from any language.</p>

<h2 id="why-a-new-importer">Why a New Importer?</h2>

<p>ArcadeDB has always offered two ways to load graph data: the <strong>standard transactional API</strong> (batching operations in explicit transactions) and the <strong>GraphImporter</strong> (an integration-level helper that manages batching for you). Both work well for moderate workloads, but at scale the transactional overhead becomes a bottleneck.</p>

<p>GraphBatch takes a fundamentally different approach. Instead of wrapping the standard API, it operates directly at the storage engine level, bypassing the transactional layer entirely during bulk import. The result: throughput that scales with your hardware, not your transaction size.</p>

<h2 id="the-benchmark">The Benchmark</h2>

<p>We ran a series of benchmarks loading graphs of increasing size on the same hardware, measuring edges ingested per second. Here are the results.</p>

<h3 id="1m-vertices-10m-edges--light-edges-no-properties">1M Vertices, 10M Edges — Light Edges (No Properties)</h3>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Time (ms)</th>
      <th>Edges/sec</th>
      <th>Speedup</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Standard API (tx/1000)</td>
      <td>267,140</td>
      <td>37,434</td>
      <td>1.00x</td>
    </tr>
    <tr>
      <td>Old GraphImporter (integration)</td>
      <td>97,160</td>
      <td>102,923</td>
      <td>2.75x</td>
    </tr>
    <tr>
      <td><strong>New GraphBatch (engine)</strong></td>
      <td><strong>31,842</strong></td>
      <td><strong>314,047</strong></td>
      <td><strong>8.39x</strong></td>
    </tr>
  </tbody>
</table>

<p>The new importer is <strong>8.39x faster</strong> than the standard API and <strong>3.05x faster</strong> than the previous GraphImporter. What previously took nearly 4.5 minutes now completes in about 32 seconds.</p>

<h3 id="1m-vertices-10m-edges--edges-with-properties-int--long">1M Vertices, 10M Edges — Edges with Properties (int + long)</h3>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Time (ms)</th>
      <th>Edges/sec</th>
      <th>Speedup</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Standard API + props (tx/1000)</td>
      <td>267,773</td>
      <td>37,345</td>
      <td>1.00x</td>
    </tr>
    <tr>
      <td><strong>New GraphBatch + props</strong></td>
      <td><strong>53,893</strong></td>
      <td><strong>185,554</strong></td>
      <td><strong>4.97x</strong></td>
    </tr>
  </tbody>
</table>

<p>Even with properties on every edge, GraphBatch delivers a <strong>4.97x speedup</strong>. The additional serialization cost is manageable because the engine-level approach avoids the per-transaction overhead that dominates at scale.</p>

<h3 id="scaling-behavior">Scaling Behavior</h3>

<p>This is where things get really interesting. We compared how each method behaves as the graph size increases:</p>

<table>
  <thead>
    <tr>
      <th>Scale</th>
      <th>Std API (edges/sec)</th>
      <th>GraphBatch (edges/sec)</th>
      <th>Speedup</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>10K vertices / 100K edges</td>
      <td>241,644</td>
      <td>1,025,019</td>
      <td>4.24x</td>
    </tr>
    <tr>
      <td>100K vertices / 1M edges</td>
      <td>103,027</td>
      <td>1,212,756</td>
      <td>11.77x</td>
    </tr>
    <tr>
      <td>1M vertices / 10M edges</td>
      <td>37,434</td>
      <td>314,047</td>
      <td>8.39x</td>
    </tr>
  </tbody>
</table>

<p>Two things stand out:</p>

<ol>
  <li>
    <p><strong>The standard API degrades significantly at scale</strong> — from 241K edges/sec at 100K edges down to just 37K edges/sec at 10M edges. This is expected: as the graph grows, transaction management, index maintenance, and page cache pressure all increase.</p>
  </li>
  <li>
    <p><strong>GraphBatch holds up far better</strong> — peaking at over <strong>1.2 million edges per second</strong> at the 1M-edge scale. At the largest scale (10M edges), memory pressure naturally reduces throughput, but it still maintains 314K edges/sec — a strong result for a single machine.</p>
  </li>
</ol>

<p>The sweet spot appears to be around the 100K–1M vertex range, where GraphBatch reaches <strong>11.77x</strong> the throughput of the standard API.</p>

<h2 id="when-to-use-graphbatch">When to Use GraphBatch</h2>

<p>GraphBatch is designed for <strong>bulk edge creation</strong> — whether that’s during initial data loading or at runtime on an existing database. It doesn’t require an empty database: as long as vertex and edge <a href="https://docs.arcadedb.com/arcadedb/reference/sql/sql-create-type.html">types</a> exist in the <a href="https://docs.arcadedb.com/arcadedb/concepts/schema.html">schema</a> and the source/destination vertices have valid RIDs, you’re good to go.</p>

<h3 id="initial-import-scenarios">Initial Import Scenarios</h3>

<ul>
  <li><strong>Data migration</strong> — moving graph data from <a href="https://docs.arcadedb.com/arcadedb/how-to/migration/neo4j-importer.html">another database</a> into ArcadeDB</li>
  <li><strong>ETL pipelines</strong> — loading large datasets from data warehouses or data lakes</li>
  <li><strong>Testing and benchmarking</strong> — quickly setting up large test graphs</li>
</ul>

<h3 id="runtime-scenarios">Runtime Scenarios</h3>

<p>GraphBatch works on live databases with existing data, making it the right tool whenever you need to create edges in bulk at runtime:</p>

<ul>
  <li><strong>Social networks</strong> — a user imports their contact list and you need to create thousands of KNOWS edges between existing Person vertices</li>
  <li><strong>IoT / time series</strong> — a periodic job links new sensor readings to their device vertices and chains them in a time series</li>
  <li><strong><a href="https://arcadedb.com/knowledge-graphs.html">Knowledge graphs</a></strong> — after an NLP pipeline extracts relationships from documents, you materialize thousands of typed edges between existing entity vertices</li>
  <li><strong><a href="https://arcadedb.com/recommendation-engine.html">Recommendation engines</a></strong> — nightly rebuild of ALSO_BOUGHT / SIMILAR_TO edges based on updated purchase data</li>
  <li><strong>Incremental ETL</strong> — periodically sync new relationships from an external system into an existing graph</li>
</ul>

<h3 id="when-not-to-use-it">When NOT to Use It</h3>

<ul>
  <li><strong>Small writes</strong> — for fewer than ~100 edges, the standard API is simpler and the importer overhead isn’t worth it</li>
  <li><strong>Concurrent reads on the same vertices</strong> — the importer disables read-your-writes and manages its own transactions, so concurrent readers may see inconsistent state until <code class="language-plaintext highlighter-rouge">close()</code></li>
  <li><strong>Immediate edge visibility required</strong> — in parallel mode, incoming edges aren’t fully connected until <code class="language-plaintext highlighter-rouge">close()</code></li>
</ul>

<p>For ongoing OLTP workloads with small, frequent writes, the standard transactional API remains the right choice — it provides full <a href="https://docs.arcadedb.com/arcadedb/concepts/transactions.html">ACID guarantees</a> with immediate visibility.</p>

<h2 id="runtime-usage-examples">Runtime Usage Examples</h2>

<h3 id="bulk-friend-import-light-edges">Bulk Friend Import (Light Edges)</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Vertices already exist in the database</span>
<span class="no">RID</span><span class="o">[]</span> <span class="n">personRIDs</span> <span class="o">=</span> <span class="n">lookupExistingPersons</span><span class="o">(</span><span class="n">contactIds</span><span class="o">);</span>

<span class="k">try</span> <span class="o">(</span><span class="nc">GraphBatch</span> <span class="n">batch</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="na">batch</span><span class="o">()</span>
    <span class="o">.</span><span class="na">withBatchSize</span><span class="o">(</span><span class="mi">50_000</span><span class="o">)</span>
    <span class="o">.</span><span class="na">withLightEdges</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span>
    <span class="o">.</span><span class="na">build</span><span class="o">())</span> <span class="o">{</span>
  <span class="k">for</span> <span class="o">(</span><span class="kt">int</span><span class="o">[]</span> <span class="n">pair</span> <span class="o">:</span> <span class="n">contactPairs</span><span class="o">)</span>
    <span class="n">batch</span><span class="o">.</span><span class="na">newEdge</span><span class="o">(</span><span class="n">personRIDs</span><span class="o">[</span><span class="n">pair</span><span class="o">[</span><span class="mi">0</span><span class="o">]],</span> <span class="s">"KNOWS"</span><span class="o">,</span> <span class="n">personRIDs</span><span class="o">[</span><span class="n">pair</span><span class="o">[</span><span class="mi">1</span><span class="o">]]);</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="iot-sensor-linkage-with-wal-for-crash-safety">IoT Sensor Linkage (with WAL for Crash Safety)</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">(</span><span class="nc">GraphBatch</span> <span class="n">batch</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="na">batch</span><span class="o">()</span>
    <span class="o">.</span><span class="na">withBatchSize</span><span class="o">(</span><span class="mi">100_000</span><span class="o">)</span>
    <span class="o">.</span><span class="na">withWAL</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span>
    <span class="o">.</span><span class="na">withCommitEvery</span><span class="o">(</span><span class="mi">10_000</span><span class="o">)</span>
    <span class="o">.</span><span class="na">build</span><span class="o">())</span> <span class="o">{</span>
  <span class="k">for</span> <span class="o">(</span><span class="nc">SensorReading</span> <span class="n">r</span> <span class="o">:</span> <span class="n">newReadings</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">batch</span><span class="o">.</span><span class="na">newEdge</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">deviceRID</span><span class="o">,</span> <span class="s">"HAS_READING"</span><span class="o">,</span> <span class="n">r</span><span class="o">.</span><span class="na">rid</span><span class="o">,</span> <span class="s">"timestamp"</span><span class="o">,</span> <span class="n">r</span><span class="o">.</span><span class="na">ts</span><span class="o">);</span>
    <span class="k">if</span> <span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">previousRID</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span>
      <span class="n">batch</span><span class="o">.</span><span class="na">newEdge</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">rid</span><span class="o">,</span> <span class="s">"NEXT"</span><span class="o">,</span> <span class="n">r</span><span class="o">.</span><span class="na">previousRID</span><span class="o">);</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="knowledge-graph-entity-resolution-with-edge-properties">Knowledge Graph Entity Resolution (with Edge Properties)</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">(</span><span class="nc">GraphBatch</span> <span class="n">batch</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="na">batch</span><span class="o">()</span>
    <span class="o">.</span><span class="na">withBatchSize</span><span class="o">(</span><span class="mi">200_000</span><span class="o">)</span>
    <span class="o">.</span><span class="na">withParallelFlush</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span>
    <span class="o">.</span><span class="na">build</span><span class="o">())</span> <span class="o">{</span>
  <span class="k">for</span> <span class="o">(</span><span class="nc">ExtractedRelation</span> <span class="n">rel</span> <span class="o">:</span> <span class="n">relations</span><span class="o">)</span>
    <span class="n">batch</span><span class="o">.</span><span class="na">newEdge</span><span class="o">(</span><span class="n">rel</span><span class="o">.</span><span class="na">subjectRID</span><span class="o">,</span> <span class="n">rel</span><span class="o">.</span><span class="na">edgeType</span><span class="o">,</span> <span class="n">rel</span><span class="o">.</span><span class="na">objectRID</span><span class="o">,</span>
        <span class="s">"confidence"</span><span class="o">,</span> <span class="n">rel</span><span class="o">.</span><span class="na">score</span><span class="o">,</span> <span class="s">"source"</span><span class="o">,</span> <span class="n">rel</span><span class="o">.</span><span class="na">docId</span><span class="o">);</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="nightly-recommendation-rebuild">Nightly Recommendation Rebuild</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Remove stale edges</span>
<span class="n">database</span><span class="o">.</span><span class="na">command</span><span class="o">(</span><span class="s">"sql"</span><span class="o">,</span> <span class="s">"DELETE EDGE ALSO_BOUGHT"</span><span class="o">);</span>

<span class="c1">// Rebuild from recommendation engine output</span>
<span class="k">try</span> <span class="o">(</span><span class="nc">GraphBatch</span> <span class="n">batch</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="na">batch</span><span class="o">()</span>
    <span class="o">.</span><span class="na">withBatchSize</span><span class="o">(</span><span class="mi">500_000</span><span class="o">)</span>
    <span class="o">.</span><span class="na">withLightEdges</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span>
    <span class="o">.</span><span class="na">build</span><span class="o">())</span> <span class="o">{</span>
  <span class="k">for</span> <span class="o">(</span><span class="nc">Recommendation</span> <span class="n">rec</span> <span class="o">:</span> <span class="n">recommendations</span><span class="o">)</span>
    <span class="n">batch</span><span class="o">.</span><span class="na">newEdge</span><span class="o">(</span><span class="n">rec</span><span class="o">.</span><span class="na">productRID</span><span class="o">,</span> <span class="s">"ALSO_BOUGHT"</span><span class="o">,</span> <span class="n">rec</span><span class="o">.</span><span class="na">relatedRID</span><span class="o">);</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="incremental-sync-from-external-database">Incremental Sync from External Database</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">(</span><span class="nc">GraphBatch</span> <span class="n">batch</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="na">batch</span><span class="o">()</span>
    <span class="o">.</span><span class="na">withBatchSize</span><span class="o">(</span><span class="mi">100_000</span><span class="o">)</span>
    <span class="o">.</span><span class="na">withWAL</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span>
    <span class="o">.</span><span class="na">build</span><span class="o">())</span> <span class="o">{</span>
  <span class="k">try</span> <span class="o">(</span><span class="nc">ResultSet</span> <span class="n">rs</span> <span class="o">=</span> <span class="n">externalDB</span><span class="o">.</span><span class="na">executeQuery</span><span class="o">(</span><span class="n">deltaQuery</span><span class="o">))</span> <span class="o">{</span>
    <span class="k">while</span> <span class="o">(</span><span class="n">rs</span><span class="o">.</span><span class="na">next</span><span class="o">())</span>
      <span class="n">batch</span><span class="o">.</span><span class="na">newEdge</span><span class="o">(</span>
          <span class="n">lookupRID</span><span class="o">(</span><span class="n">rs</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="s">"from_id"</span><span class="o">)),</span>
          <span class="s">"REPORTS_TO"</span><span class="o">,</span>
          <span class="n">lookupRID</span><span class="o">(</span><span class="n">rs</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="s">"to_id"</span><span class="o">)),</span>
          <span class="s">"since"</span><span class="o">,</span> <span class="n">rs</span><span class="o">.</span><span class="na">getDate</span><span class="o">(</span><span class="s">"start_date"</span><span class="o">));</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<blockquote>
  <p><strong>Tip</strong>: For runtime usage on production databases, enable WAL with <code class="language-plaintext highlighter-rouge">withWAL(true)</code> for crash safety. For initial imports where you can re-run on failure, leaving WAL off maximizes throughput.</p>
</blockquote>

<h2 id="http-batch-endpoint--graphbatch-for-every-language">HTTP Batch Endpoint — GraphBatch for Every Language</h2>

<p>GraphBatch is a Java API, but not everyone embeds ArcadeDB in a JVM application. That’s why v26.3.2 also ships a new <strong>HTTP batch endpoint</strong> that exposes the full power of GraphBatch over the <a href="https://docs.arcadedb.com/arcadedb/reference/http-api/http.html">HTTP API</a> — no Java required.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>POST /api/v1/batch/{database}
</code></pre></div></div>

<p>It supports two input formats: <strong>JSONL</strong> (newline-delimited JSON) and <strong>CSV</strong>. Both are streamed — the server never loads the entire payload into memory, so you can push millions of records in a single request.</p>

<h3 id="jsonl-format">JSONL Format</h3>

<pre><code class="language-jsonl">{"@type":"vertex","@class":"Person","@id":"t1","name":"Alice","age":30}
{"@type":"vertex","@class":"Person","@id":"t2","name":"Bob","age":25}
{"@type":"edge","@class":"KNOWS","@from":"t1","@to":"t2","since":2020}
</code></pre>

<h3 id="csv-format">CSV Format</h3>

<pre><code class="language-csv">@type,@class,@id,name,age
vertex,Person,t1,Alice,30
vertex,Person,t2,Bob,25
---
@type,@class,@from,@to,since
edge,KNOWS,t1,t2,2020
</code></pre>

<p>In both formats, vertices come first, then edges. Vertices can have temporary IDs (<code class="language-plaintext highlighter-rouge">@id</code>) that edges reference via <code class="language-plaintext highlighter-rouge">@from</code>/<code class="language-plaintext highlighter-rouge">@to</code>. Edges can also reference existing database RIDs directly (e.g., <code class="language-plaintext highlighter-rouge">#12:0</code>).</p>

<h3 id="temporary-id-mapping">Temporary ID Mapping</h3>

<p>The response includes an <code class="language-plaintext highlighter-rouge">idMapping</code> object so you know what RIDs were assigned:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"verticesCreated"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
  </span><span class="nl">"edgesCreated"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
  </span><span class="nl">"elapsedMs"</span><span class="p">:</span><span class="w"> </span><span class="mi">42</span><span class="p">,</span><span class="w">
  </span><span class="nl">"idMapping"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"t1"</span><span class="p">:</span><span class="w"> </span><span class="s2">"#9:0"</span><span class="p">,</span><span class="w"> </span><span class="nl">"t2"</span><span class="p">:</span><span class="w"> </span><span class="s2">"#9:1"</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<h3 id="tuning-via-query-parameters">Tuning via Query Parameters</h3>

<p>All GraphBatch configuration options are exposed as query parameters:</p>

<table>
  <thead>
    <tr>
      <th>Parameter</th>
      <th>Default</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">batchSize</code></td>
      <td>100000</td>
      <td>Max edges buffered before auto-flush</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">lightEdges</code></td>
      <td>false</td>
      <td>Property-less edges stored as connectivity only (saves ~33% I/O)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">wal</code></td>
      <td>false</td>
      <td>Enable Write-Ahead Logging for crash safety</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">parallelFlush</code></td>
      <td>true</td>
      <td>Parallelize edge connection across async threads</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">preAllocateEdgeChunks</code></td>
      <td>true</td>
      <td>Pre-allocate edge segments on vertex creation</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">edgeListInitialSize</code></td>
      <td>2048</td>
      <td>Initial segment size in bytes (64–8192)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">bidirectional</code></td>
      <td>true</td>
      <td>Connect both outgoing and incoming edges</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">commitEvery</code></td>
      <td>50000</td>
      <td>Edges per sub-transaction within a flush</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">expectedEdgeCount</code></td>
      <td>0</td>
      <td>Hint for auto-tuning batch size</td>
    </tr>
  </tbody>
</table>

<h3 id="examples">Examples</h3>

<p><strong>curl (JSONL):</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-X</span> POST <span class="s2">"http://localhost:2480/api/v1/batch/mydb?lightEdges=true"</span> <span class="se">\</span>
  <span class="nt">-u</span> root:password <span class="se">\</span>
  <span class="nt">-H</span> <span class="s2">"Content-Type: application/x-ndjson"</span> <span class="se">\</span>
  <span class="nt">--data-binary</span> @graph-data.jsonl
</code></pre></div></div>

<p><strong>curl (CSV):</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-X</span> POST <span class="s2">"http://localhost:2480/api/v1/batch/mydb"</span> <span class="se">\</span>
  <span class="nt">-u</span> root:password <span class="se">\</span>
  <span class="nt">-H</span> <span class="s2">"Content-Type: text/csv"</span> <span class="se">\</span>
  <span class="nt">--data-binary</span> @graph-data.csv
</code></pre></div></div>

<p><strong>Python:</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">requests</span>

<span class="n">data</span> <span class="o">=</span> <span class="p">(</span>
    <span class="sh">'</span><span class="s">{</span><span class="sh">"</span><span class="s">@type</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">vertex</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">@class</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">Person</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">@id</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">p1</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="s">}</span><span class="se">\n</span><span class="sh">'</span>
    <span class="sh">'</span><span class="s">{</span><span class="sh">"</span><span class="s">@type</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">vertex</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">@class</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">Person</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">@id</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">p2</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="s">}</span><span class="se">\n</span><span class="sh">'</span>
    <span class="sh">'</span><span class="s">{</span><span class="sh">"</span><span class="s">@type</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">edge</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">@class</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">KNOWS</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">@from</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">p1</span><span class="sh">"</span><span class="s">,</span><span class="sh">"</span><span class="s">@to</span><span class="sh">"</span><span class="s">:</span><span class="sh">"</span><span class="s">p2</span><span class="sh">"</span><span class="s">}</span><span class="se">\n</span><span class="sh">'</span>
<span class="p">)</span>

<span class="n">resp</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span>
    <span class="sh">"</span><span class="s">http://localhost:2480/api/v1/batch/mydb?lightEdges=true</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">auth</span><span class="o">=</span><span class="p">(</span><span class="sh">"</span><span class="s">root</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">password</span><span class="sh">"</span><span class="p">),</span>
    <span class="n">headers</span><span class="o">=</span><span class="p">{</span><span class="sh">"</span><span class="s">Content-Type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">application/x-ndjson</span><span class="sh">"</span><span class="p">},</span>
    <span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span>
<span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">resp</span><span class="p">.</span><span class="nf">json</span><span class="p">())</span>
<span class="c1"># {'verticesCreated': 2, 'edgesCreated': 1, 'elapsedMs': 15, 'idMapping': {'p1': '#9:0', 'p2': '#9:1'}}
</span></code></pre></div></div>

<p><strong>JavaScript (Node.js):</strong></p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">fetch</span><span class="p">(</span><span class="dl">"</span><span class="s2">http://localhost:2480/api/v1/batch/mydb</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
  <span class="na">method</span><span class="p">:</span> <span class="dl">"</span><span class="s2">POST</span><span class="dl">"</span><span class="p">,</span>
  <span class="na">headers</span><span class="p">:</span> <span class="p">{</span>
    <span class="dl">"</span><span class="s2">Content-Type</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">application/x-ndjson</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">Authorization</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Basic </span><span class="dl">"</span> <span class="o">+</span> <span class="nf">btoa</span><span class="p">(</span><span class="dl">"</span><span class="s2">root:password</span><span class="dl">"</span><span class="p">),</span>
  <span class="p">},</span>
  <span class="na">body</span><span class="p">:</span> <span class="p">[</span>
    <span class="dl">'</span><span class="s1">{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}</span><span class="dl">'</span><span class="p">,</span>
    <span class="dl">'</span><span class="s1">{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}</span><span class="dl">'</span><span class="p">,</span>
    <span class="dl">'</span><span class="s1">{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}</span><span class="dl">'</span><span class="p">,</span>
  <span class="p">].</span><span class="nf">join</span><span class="p">(</span><span class="dl">"</span><span class="se">\n</span><span class="dl">"</span><span class="p">),</span>
<span class="p">});</span>
<span class="nx">console</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="k">await</span> <span class="nx">resp</span><span class="p">.</span><span class="nf">json</span><span class="p">());</span>
</code></pre></div></div>

<blockquote>
  <p><strong>Tip</strong>: For maximum throughput, group vertices by type in the input. The endpoint batches consecutive same-type vertices into a single <code class="language-plaintext highlighter-rouge">createVertices()</code> call. Interleaving types forces smaller batches.</p>
</blockquote>

<blockquote>
  <p><strong>Tip</strong>: The endpoint is NOT atomic by design - GraphBatch commits internally in chunks for maximum throughput. Treat it as a bulk-loading operation, not a transactional one. The response tells you exactly how many records were committed.</p>
</blockquote>

<h2 id="grpc-streaming-api---graphbatch-with-backpressure">gRPC Streaming API - GraphBatch with Backpressure</h2>

<p>For high-throughput pipelines where HTTP overhead matters, v26.3.2 also ships a <strong>streaming gRPC endpoint</strong> that wraps GraphBatch. It uses client-streaming RPC with built-in flow control, so the server applies backpressure when it’s flushing to disk - your producer never overwhelms the database.</p>

<div class="language-protobuf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">rpc</span> <span class="n">GraphBatchLoad</span> <span class="p">(</span><span class="n">stream</span> <span class="n">GraphBatchChunk</span><span class="p">)</span> <span class="k">returns</span> <span class="p">(</span><span class="n">GraphBatchResult</span><span class="p">);</span>
</code></pre></div></div>

<p>The client sends a stream of <code class="language-plaintext highlighter-rouge">GraphBatchChunk</code> messages, each containing a batch of vertex or edge records. The first chunk must include the database name and any configuration options. When the stream closes, the server returns a single <code class="language-plaintext highlighter-rouge">GraphBatchResult</code> with counts and the temporary ID-to-RID mapping.</p>

<h3 id="why-grpc">Why gRPC?</h3>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>HTTP Batch</th>
      <th>gRPC Streaming</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Protocol</strong></td>
      <td>Single HTTP request, streamed body</td>
      <td>Client-streaming RPC with backpressure</td>
    </tr>
    <tr>
      <td><strong>Backpressure</strong></td>
      <td>None (server buffers or drops)</td>
      <td>Built-in flow control per chunk</td>
    </tr>
    <tr>
      <td><strong>Format</strong></td>
      <td>JSONL or CSV (text)</td>
      <td>Protobuf (binary, typed)</td>
    </tr>
    <tr>
      <td><strong>Best for</strong></td>
      <td>Scripts, one-off imports, simple integrations</td>
      <td>High-throughput pipelines, microservices, polyglot stacks</td>
    </tr>
    <tr>
      <td><strong>Language support</strong></td>
      <td>Any HTTP client</td>
      <td>Go, Python, Java, C++, Rust, Node.js, and more</td>
    </tr>
  </tbody>
</table>

<p>Both endpoints expose the same GraphBatch options and deliver the same engine-level performance. Choose gRPC when you need backpressure, binary efficiency, or native code generation from the proto file.</p>

<h3 id="message-structure">Message Structure</h3>

<p>Each <code class="language-plaintext highlighter-rouge">GraphBatchChunk</code> contains:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">database</code> - the target database name (required on the first chunk)</li>
  <li><code class="language-plaintext highlighter-rouge">credentials</code> - optional authentication</li>
  <li><code class="language-plaintext highlighter-rouge">options</code> - GraphBatch configuration (same parameters as the HTTP endpoint)</li>
  <li><code class="language-plaintext highlighter-rouge">records</code> - a list of vertex or edge records</li>
</ul>

<p>Records use the <code class="language-plaintext highlighter-rouge">GraphBatchRecord</code> message:</p>

<div class="language-protobuf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">message</span> <span class="nc">GraphBatchRecord</span> <span class="p">{</span>
  <span class="kd">enum</span> <span class="n">Kind</span> <span class="p">{</span> <span class="na">VERTEX</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="na">EDGE</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>
  <span class="n">Kind</span>   <span class="na">kind</span>      <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
  <span class="kt">string</span> <span class="na">type_name</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>  <span class="c1">// vertex or edge type name</span>
  <span class="kt">string</span> <span class="na">temp_id</span>   <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>  <span class="c1">// vertex temp ID (for edge references)</span>
  <span class="kt">string</span> <span class="na">from_ref</span>  <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>  <span class="c1">// edge source: temp ID or "#bucket:pos"</span>
  <span class="kt">string</span> <span class="na">to_ref</span>    <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>  <span class="c1">// edge target: temp ID or "#bucket:pos"</span>
  <span class="n">map</span><span class="o">&lt;</span><span class="kt">string</span><span class="p">,</span> <span class="n">GrpcValue</span><span class="err">&gt;</span> <span class="na">properties</span> <span class="o">=</span> <span class="mi">6</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Important</strong>: all vertex records must appear before any edge records across all chunks. Interleaving is not supported and will result in an error.</p>

<h3 id="python-example-grpcio">Python Example (grpcio)</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">grpc</span>
<span class="kn">from</span> <span class="n">arcadedb_pb2</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="n">arcadedb_pb2_grpc</span> <span class="kn">import</span> <span class="n">ArcadeDbServiceStub</span>

<span class="n">channel</span> <span class="o">=</span> <span class="n">grpc</span><span class="p">.</span><span class="nf">insecure_channel</span><span class="p">(</span><span class="sh">"</span><span class="s">localhost:2424</span><span class="sh">"</span><span class="p">)</span>
<span class="n">stub</span> <span class="o">=</span> <span class="nc">ArcadeDbServiceStub</span><span class="p">(</span><span class="n">channel</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">generate_chunks</span><span class="p">():</span>
    <span class="c1"># First chunk: database, options, and initial vertices
</span>    <span class="k">yield</span> <span class="nc">GraphBatchChunk</span><span class="p">(</span>
        <span class="n">database</span><span class="o">=</span><span class="sh">"</span><span class="s">mydb</span><span class="sh">"</span><span class="p">,</span>
        <span class="n">credentials</span><span class="o">=</span><span class="nc">DatabaseCredentials</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="sh">"</span><span class="s">root</span><span class="sh">"</span><span class="p">,</span> <span class="n">password</span><span class="o">=</span><span class="sh">"</span><span class="s">password</span><span class="sh">"</span><span class="p">),</span>
        <span class="n">options</span><span class="o">=</span><span class="nc">GraphBatchOptions</span><span class="p">(</span><span class="n">light_edges</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">100000</span><span class="p">),</span>
        <span class="n">records</span><span class="o">=</span><span class="p">[</span>
            <span class="nc">GraphBatchRecord</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="n">GraphBatchRecord</span><span class="p">.</span><span class="n">VERTEX</span><span class="p">,</span>
                             <span class="n">type_name</span><span class="o">=</span><span class="sh">"</span><span class="s">Person</span><span class="sh">"</span><span class="p">,</span> <span class="n">temp_id</span><span class="o">=</span><span class="sh">"</span><span class="s">p1</span><span class="sh">"</span><span class="p">,</span>
                             <span class="n">properties</span><span class="o">=</span><span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="nc">GrpcValue</span><span class="p">(</span><span class="n">string_value</span><span class="o">=</span><span class="sh">"</span><span class="s">Alice</span><span class="sh">"</span><span class="p">)}),</span>
            <span class="nc">GraphBatchRecord</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="n">GraphBatchRecord</span><span class="p">.</span><span class="n">VERTEX</span><span class="p">,</span>
                             <span class="n">type_name</span><span class="o">=</span><span class="sh">"</span><span class="s">Person</span><span class="sh">"</span><span class="p">,</span> <span class="n">temp_id</span><span class="o">=</span><span class="sh">"</span><span class="s">p2</span><span class="sh">"</span><span class="p">,</span>
                             <span class="n">properties</span><span class="o">=</span><span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="nc">GrpcValue</span><span class="p">(</span><span class="n">string_value</span><span class="o">=</span><span class="sh">"</span><span class="s">Bob</span><span class="sh">"</span><span class="p">)}),</span>
        <span class="p">],</span>
    <span class="p">)</span>
    <span class="c1"># Second chunk: edges referencing temp IDs
</span>    <span class="k">yield</span> <span class="nc">GraphBatchChunk</span><span class="p">(</span>
        <span class="n">records</span><span class="o">=</span><span class="p">[</span>
            <span class="nc">GraphBatchRecord</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="n">GraphBatchRecord</span><span class="p">.</span><span class="n">EDGE</span><span class="p">,</span>
                             <span class="n">type_name</span><span class="o">=</span><span class="sh">"</span><span class="s">KNOWS</span><span class="sh">"</span><span class="p">,</span>
                             <span class="n">from_ref</span><span class="o">=</span><span class="sh">"</span><span class="s">p1</span><span class="sh">"</span><span class="p">,</span> <span class="n">to_ref</span><span class="o">=</span><span class="sh">"</span><span class="s">p2</span><span class="sh">"</span><span class="p">),</span>
        <span class="p">],</span>
    <span class="p">)</span>

<span class="n">result</span> <span class="o">=</span> <span class="n">stub</span><span class="p">.</span><span class="nc">GraphBatchLoad</span><span class="p">(</span><span class="nf">generate_chunks</span><span class="p">())</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Created </span><span class="si">{</span><span class="n">result</span><span class="p">.</span><span class="n">vertices_created</span><span class="si">}</span><span class="s"> vertices, </span><span class="si">{</span><span class="n">result</span><span class="p">.</span><span class="n">edges_created</span><span class="si">}</span><span class="s"> edges </span><span class="sh">"</span>
      <span class="sa">f</span><span class="sh">"</span><span class="s">in </span><span class="si">{</span><span class="n">result</span><span class="p">.</span><span class="n">elapsed_ms</span><span class="si">}</span><span class="s">ms</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">ID mapping: </span><span class="si">{</span><span class="nf">dict</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">id_mapping</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="c1"># Created 2 vertices, 1 edges in 12ms
# ID mapping: {'p1': '#9:0', 'p2': '#9:1'}
</span></code></pre></div></div>

<h3 id="go-example">Go Example</h3>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stream</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">client</span><span class="o">.</span><span class="n">GraphBatchLoad</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="n">log</span><span class="o">.</span><span class="n">Fatal</span><span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="p">}</span>

<span class="c">// First chunk with vertices</span>
<span class="n">stream</span><span class="o">.</span><span class="n">Send</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchChunk</span><span class="p">{</span>
    <span class="n">Database</span><span class="o">:</span>    <span class="s">"mydb"</span><span class="p">,</span>
    <span class="n">Credentials</span><span class="o">:</span> <span class="o">&amp;</span><span class="n">pb</span><span class="o">.</span><span class="n">DatabaseCredentials</span><span class="p">{</span><span class="n">Username</span><span class="o">:</span> <span class="s">"root"</span><span class="p">,</span> <span class="n">Password</span><span class="o">:</span> <span class="s">"password"</span><span class="p">},</span>
    <span class="n">Options</span><span class="o">:</span>     <span class="o">&amp;</span><span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchOptions</span><span class="p">{</span><span class="n">LightEdges</span><span class="o">:</span> <span class="no">true</span><span class="p">},</span>
    <span class="n">Records</span><span class="o">:</span> <span class="p">[]</span><span class="o">*</span><span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchRecord</span><span class="p">{</span>
        <span class="p">{</span><span class="n">Kind</span><span class="o">:</span> <span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchRecord_VERTEX</span><span class="p">,</span> <span class="n">TypeName</span><span class="o">:</span> <span class="s">"Person"</span><span class="p">,</span>
         <span class="n">TempId</span><span class="o">:</span> <span class="s">"p1"</span><span class="p">,</span> <span class="n">Properties</span><span class="o">:</span> <span class="k">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="n">pb</span><span class="o">.</span><span class="n">GrpcValue</span><span class="p">{</span>
            <span class="s">"name"</span><span class="o">:</span> <span class="p">{</span><span class="n">Value</span><span class="o">:</span> <span class="o">&amp;</span><span class="n">pb</span><span class="o">.</span><span class="n">GrpcValue_StringValue</span><span class="p">{</span><span class="n">StringValue</span><span class="o">:</span> <span class="s">"Alice"</span><span class="p">}},</span>
        <span class="p">}},</span>
        <span class="p">{</span><span class="n">Kind</span><span class="o">:</span> <span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchRecord_VERTEX</span><span class="p">,</span> <span class="n">TypeName</span><span class="o">:</span> <span class="s">"Person"</span><span class="p">,</span>
         <span class="n">TempId</span><span class="o">:</span> <span class="s">"p2"</span><span class="p">,</span> <span class="n">Properties</span><span class="o">:</span> <span class="k">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="n">pb</span><span class="o">.</span><span class="n">GrpcValue</span><span class="p">{</span>
            <span class="s">"name"</span><span class="o">:</span> <span class="p">{</span><span class="n">Value</span><span class="o">:</span> <span class="o">&amp;</span><span class="n">pb</span><span class="o">.</span><span class="n">GrpcValue_StringValue</span><span class="p">{</span><span class="n">StringValue</span><span class="o">:</span> <span class="s">"Bob"</span><span class="p">}},</span>
        <span class="p">}},</span>
    <span class="p">},</span>
<span class="p">})</span>

<span class="c">// Second chunk with edges</span>
<span class="n">stream</span><span class="o">.</span><span class="n">Send</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchChunk</span><span class="p">{</span>
    <span class="n">Records</span><span class="o">:</span> <span class="p">[]</span><span class="o">*</span><span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchRecord</span><span class="p">{</span>
        <span class="p">{</span><span class="n">Kind</span><span class="o">:</span> <span class="n">pb</span><span class="o">.</span><span class="n">GraphBatchRecord_EDGE</span><span class="p">,</span> <span class="n">TypeName</span><span class="o">:</span> <span class="s">"KNOWS"</span><span class="p">,</span>
         <span class="n">FromRef</span><span class="o">:</span> <span class="s">"p1"</span><span class="p">,</span> <span class="n">ToRef</span><span class="o">:</span> <span class="s">"p2"</span><span class="p">},</span>
    <span class="p">},</span>
<span class="p">})</span>

<span class="n">result</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">stream</span><span class="o">.</span><span class="n">CloseAndRecv</span><span class="p">()</span>
<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"Created %d vertices, %d edges in %dms</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
    <span class="n">result</span><span class="o">.</span><span class="n">VerticesCreated</span><span class="p">,</span> <span class="n">result</span><span class="o">.</span><span class="n">EdgesCreated</span><span class="p">,</span> <span class="n">result</span><span class="o">.</span><span class="n">ElapsedMs</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="java-example-generated-stubs">Java Example (generated stubs)</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">StreamObserver</span><span class="o">&lt;</span><span class="nc">GraphBatchResult</span><span class="o">&gt;</span> <span class="n">responseObserver</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">StreamObserver</span><span class="o">&lt;&gt;()</span> <span class="o">{</span>
    <span class="nd">@Override</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">onNext</span><span class="o">(</span><span class="nc">GraphBatchResult</span> <span class="n">result</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">printf</span><span class="o">(</span><span class="s">"Created %d vertices, %d edges in %dms%n"</span><span class="o">,</span>
            <span class="n">result</span><span class="o">.</span><span class="na">getVerticesCreated</span><span class="o">(),</span> <span class="n">result</span><span class="o">.</span><span class="na">getEdgesCreated</span><span class="o">(),</span> <span class="n">result</span><span class="o">.</span><span class="na">getElapsedMs</span><span class="o">());</span>
    <span class="o">}</span>
    <span class="nd">@Override</span> <span class="kd">public</span> <span class="kt">void</span> <span class="nf">onError</span><span class="o">(</span><span class="nc">Throwable</span> <span class="n">t</span><span class="o">)</span> <span class="o">{</span> <span class="n">t</span><span class="o">.</span><span class="na">printStackTrace</span><span class="o">();</span> <span class="o">}</span>
    <span class="nd">@Override</span> <span class="kd">public</span> <span class="kt">void</span> <span class="nf">onCompleted</span><span class="o">()</span> <span class="o">{</span> <span class="o">}</span>
<span class="o">};</span>

<span class="nc">StreamObserver</span><span class="o">&lt;</span><span class="nc">GraphBatchChunk</span><span class="o">&gt;</span> <span class="n">requestStream</span> <span class="o">=</span> <span class="n">stub</span><span class="o">.</span><span class="na">graphBatchLoad</span><span class="o">(</span><span class="n">responseObserver</span><span class="o">);</span>

<span class="c1">// Send vertices</span>
<span class="n">requestStream</span><span class="o">.</span><span class="na">onNext</span><span class="o">(</span><span class="nc">GraphBatchChunk</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">()</span>
    <span class="o">.</span><span class="na">setDatabase</span><span class="o">(</span><span class="s">"mydb"</span><span class="o">)</span>
    <span class="o">.</span><span class="na">setCredentials</span><span class="o">(</span><span class="nc">DatabaseCredentials</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">()</span>
        <span class="o">.</span><span class="na">setUsername</span><span class="o">(</span><span class="s">"root"</span><span class="o">).</span><span class="na">setPassword</span><span class="o">(</span><span class="s">"password"</span><span class="o">))</span>
    <span class="o">.</span><span class="na">setOptions</span><span class="o">(</span><span class="nc">GraphBatchOptions</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">().</span><span class="na">setLightEdges</span><span class="o">(</span><span class="kc">true</span><span class="o">))</span>
    <span class="o">.</span><span class="na">addRecords</span><span class="o">(</span><span class="nc">GraphBatchRecord</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">()</span>
        <span class="o">.</span><span class="na">setKind</span><span class="o">(</span><span class="nc">GraphBatchRecord</span><span class="o">.</span><span class="na">Kind</span><span class="o">.</span><span class="na">VERTEX</span><span class="o">)</span>
        <span class="o">.</span><span class="na">setTypeName</span><span class="o">(</span><span class="s">"Person"</span><span class="o">).</span><span class="na">setTempId</span><span class="o">(</span><span class="s">"p1"</span><span class="o">)</span>
        <span class="o">.</span><span class="na">putProperties</span><span class="o">(</span><span class="s">"name"</span><span class="o">,</span> <span class="nc">GrpcValue</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">().</span><span class="na">setStringValue</span><span class="o">(</span><span class="s">"Alice"</span><span class="o">).</span><span class="na">build</span><span class="o">()))</span>
    <span class="o">.</span><span class="na">addRecords</span><span class="o">(</span><span class="nc">GraphBatchRecord</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">()</span>
        <span class="o">.</span><span class="na">setKind</span><span class="o">(</span><span class="nc">GraphBatchRecord</span><span class="o">.</span><span class="na">Kind</span><span class="o">.</span><span class="na">VERTEX</span><span class="o">)</span>
        <span class="o">.</span><span class="na">setTypeName</span><span class="o">(</span><span class="s">"Person"</span><span class="o">).</span><span class="na">setTempId</span><span class="o">(</span><span class="s">"p2"</span><span class="o">)</span>
        <span class="o">.</span><span class="na">putProperties</span><span class="o">(</span><span class="s">"name"</span><span class="o">,</span> <span class="nc">GrpcValue</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">().</span><span class="na">setStringValue</span><span class="o">(</span><span class="s">"Bob"</span><span class="o">).</span><span class="na">build</span><span class="o">()))</span>
    <span class="o">.</span><span class="na">build</span><span class="o">());</span>

<span class="c1">// Send edges</span>
<span class="n">requestStream</span><span class="o">.</span><span class="na">onNext</span><span class="o">(</span><span class="nc">GraphBatchChunk</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">()</span>
    <span class="o">.</span><span class="na">addRecords</span><span class="o">(</span><span class="nc">GraphBatchRecord</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">()</span>
        <span class="o">.</span><span class="na">setKind</span><span class="o">(</span><span class="nc">GraphBatchRecord</span><span class="o">.</span><span class="na">Kind</span><span class="o">.</span><span class="na">EDGE</span><span class="o">)</span>
        <span class="o">.</span><span class="na">setTypeName</span><span class="o">(</span><span class="s">"KNOWS"</span><span class="o">).</span><span class="na">setFromRef</span><span class="o">(</span><span class="s">"p1"</span><span class="o">).</span><span class="na">setToRef</span><span class="o">(</span><span class="s">"p2"</span><span class="o">))</span>
    <span class="o">.</span><span class="na">build</span><span class="o">());</span>

<span class="n">requestStream</span><span class="o">.</span><span class="na">onCompleted</span><span class="o">();</span>
</code></pre></div></div>

<blockquote>
  <p><strong>Tip</strong>: For very large imports with millions of vertices using temp IDs, the <code class="language-plaintext highlighter-rouge">id_mapping</code> in the response may exceed the default gRPC message size limit (4 MB). In that case, increase <code class="language-plaintext highlighter-rouge">maxInboundMessageSize</code> on the client, or skip temp IDs when you don’t need the RID mapping back.</p>
</blockquote>

<blockquote>
  <p><strong>Tip</strong>: Like the HTTP endpoint, the gRPC streaming API is NOT atomic - GraphBatch commits internally in chunks. If the stream is interrupted mid-flight, records already flushed are committed. Design your pipeline for idempotent re-runs.</p>
</blockquote>

<h2 id="get-started">Get Started</h2>

<p>GraphBatch is available starting from <strong>ArcadeDB v26.3.2</strong>. Check out the <a href="https://docs.arcadedb.com">documentation</a> for API details and usage examples.</p>

<p><strong>Download ArcadeDB v26.3.2</strong>: <a href="https://github.com/ArcadeData/arcadedb/releases">GitHub Releases</a></p>

<p>If you have questions or feedback, join us on <a href="https://discord.gg/arcadedb">Discord</a> or open an issue on <a href="https://github.com/ArcadeData/arcadedb/issues">GitHub</a>.</p>]]></content><author><name>Luca Garulli</name></author><category term="Graph Database" /><category term="Performance" /><category term="Import" /><category term="Benchmark" /><summary type="html"><![CDATA[ArcadeDB's new GraphBatch delivers up to 8.39x faster graph ingestion than the standard API and 3x faster than the previous GraphImporter, reaching over 1.2 million edges per second at medium scale. Now accessible from any language via the new HTTP batch endpoint (JSONL/CSV) and a streaming gRPC API with backpressure support.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/arcadedb-graphbatch.jpg" /><media:content medium="image" url="https://arcadedb.com/assets/images/arcadedb-graphbatch.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Cognee + ArcadeDB: AI Memory Meets Multi-Model</title><link href="https://arcadedb.com/blog/cognee-arcadedb-ai-memory-meets-multi-model/" rel="alternate" type="text/html" title="Cognee + ArcadeDB: AI Memory Meets Multi-Model" /><published>2026-03-30T00:00:00+00:00</published><updated>2026-03-30T00:00:00+00:00</updated><id>https://arcadedb.com/blog/cognee-arcadedb-ai-memory-meets-multi-model</id><content type="html" xml:base="https://arcadedb.com/blog/cognee-arcadedb-ai-memory-meets-multi-model/"><![CDATA[<p>AI agents need memory. Not just a conversation buffer that disappears after each session — real, persistent memory that learns from every interaction, connects facts across documents, and retrieves exactly the right context when the agent needs it.</p>

<p>That’s what <a href="https://github.com/topoteretes/cognee">Cognee</a> does. It’s an open-source AI memory engine with 14,600+ GitHub stars, $7.5M in seed funding, and 70+ companies using it in production. Cognee ingests data in any format, builds a knowledge graph using cognitive science approaches, and gives AI agents the ability to search across both vector embeddings and graph relationships.</p>

<p>ArcadeDB is now available as a graph database backend for Cognee — and its multi-model architecture makes it uniquely suited for the job.</p>

<hr />

<h2 id="what-cognee-does">What Cognee Does</h2>

<p>Cognee’s API is intentionally minimal. Three functions cover the entire pipeline:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">cognee</span>

<span class="k">await</span> <span class="n">cognee</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="sh">"</span><span class="s">your data here</span><span class="sh">"</span><span class="p">)</span>   <span class="c1"># Ingest documents, text, or URLs
</span><span class="k">await</span> <span class="n">cognee</span><span class="p">.</span><span class="nf">cognify</span><span class="p">()</span>                <span class="c1"># Build the knowledge graph
</span><span class="k">await</span> <span class="n">cognee</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="sh">"</span><span class="s">your query</span><span class="sh">"</span><span class="p">)</span>     <span class="c1"># Search across graph + vectors
</span></code></pre></div></div>

<p>Under the hood, Cognee extracts entities and relationships from your data, builds a knowledge graph, generates vector embeddings, and stores everything for retrieval. When an agent searches, Cognee combines graph traversal with vector similarity to return contextually rich results — not just the closest embedding match, but the connected facts around it.</p>

<p>This architecture requires two database backends: a <strong>graph database</strong> for entities and relationships, and a <strong>vector store</strong> for embeddings. Most Cognee deployments use separate databases for each — for example, <a href="https://arcadedb.com/neo4j.html">Neo4j</a> for graphs and Qdrant for vectors.</p>

<p>ArcadeDB handles both in a single engine.</p>

<hr />

<h2 id="why-arcadedb">Why ArcadeDB</h2>

<p>ArcadeDB is a <a href="https://docs.arcadedb.com/arcadedb/concepts/multi-model.html">multi-model database</a> that natively supports graphs, documents, key/value, <a href="https://docs.arcadedb.com/arcadedb/concepts/timeseries.html">time series</a>, <a href="https://docs.arcadedb.com/arcadedb/how-to/data-modeling/full-text-index.html">full-text search</a>, and <a href="https://docs.arcadedb.com/arcadedb/concepts/vector-search.html#vector-search-concepts">vector embeddings</a>. For Cognee, this means:</p>

<p><strong>One database instead of two (or three).</strong> ArcadeDB stores your <a href="https://arcadedb.com/knowledge-graphs.html">knowledge graph</a> <em>and</em> your <a href="https://docs.arcadedb.com/arcadedb/tutorials/vector-search-tutorial.html">vector embeddings</a> in the same engine. No need to synchronize data between a graph database and a separate vector store. No additional infrastructure to deploy, monitor, and maintain.</p>

<p><strong>Native graph performance.</strong> ArcadeDB isn’t a graph layer on top of a relational engine. It uses a native graph storage model with direct record links — no index lookups for traversals. On the <a href="/blog/graph-olap-engine-the-fastest-graph-analytics-with-zero-compromises/">LDBC Graphalytics benchmark</a>, ArcadeDB is up to 9x faster than KuzuDB (Cognee’s previous default) on algorithms like PageRank and BFS.</p>

<p><img src="/assets/images/arcadedb-vs-kuzu-benchmark.svg" alt="ArcadeDB vs KuzuDB — LDBC Graphalytics Benchmark" /></p>

<p>ArcadeDB is faster on every LDBC Graphalytics algorithm and up to 25x faster on LSQB subgraph pattern matching queries. Full benchmark results are <a href="https://github.com/ArcadeData/ldbc_graphalytics_platforms_arcadedb">available on GitHub</a>.</p>

<p><strong><a href="https://arcadedb.com/blog/native-opencypher/">OpenCypher</a> compatibility.</strong> ArcadeDB passes 97.8% of the official Cypher Technology Compatibility Kit. The Cognee adapter uses standard <a href="https://docs.arcadedb.com/arcadedb/reference/cypher/chapter.html">Cypher queries</a> over the <a href="https://docs.arcadedb.com/arcadedb/how-to/connectivity/bolt.html">Bolt protocol</a> — the same protocol and query language used by Neo4j. No proprietary APIs.</p>

<p><strong>Apache 2.0, forever.</strong> ArcadeDB is fully open source under the Apache 2.0 license, with a <a href="/blog/open-source-forever-why-arcadedb-will-never-change-its-license/">public commitment to never change it</a>. After <a href="https://arcadedb.com/blog/from-kuzudb-to-arcadedb-migration-guide/">KuzuDB’s acquisition</a> by Apple and subsequent archival, licensing stability matters more than ever.</p>

<hr />

<h2 id="setting-up-arcadedb-with-cognee">Setting Up ArcadeDB with Cognee</h2>

<h3 id="1-start-arcadedb-with-bolt-enabled">1. Start ArcadeDB with Bolt enabled</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">-d</span> <span class="nt">--name</span> arcadedb <span class="nt">-p</span> 2480:2480 <span class="nt">-p</span> 7687:7687 <span class="se">\</span>
  <span class="nt">-e</span> <span class="nv">JAVA_OPTS</span><span class="o">=</span><span class="s2">"-Darcadedb.server.rootPassword=arcadedb </span><span class="se">\</span><span class="s2">
  -Darcadedb.server.defaultDatabases=cognee[root]{} </span><span class="se">\</span><span class="s2">
  -Darcadedb.server.plugins=Bolt:com.arcadedb.bolt.BoltProtocolPlugin"</span> <span class="se">\</span>
  arcadedata/arcadedb:latest
</code></pre></div></div>

<p>This starts ArcadeDB with the Bolt protocol on port 7687 and automatically creates a <code class="language-plaintext highlighter-rouge">cognee</code> database.</p>

<h3 id="2-install-the-cognee-arcadedb-adapter">2. Install the Cognee ArcadeDB adapter</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>cognee cognee-community-graph-adapter-arcadedb
</code></pre></div></div>

<h3 id="3-configure-cognee-to-use-arcadedb">3. Configure Cognee to use ArcadeDB</h3>

<p>Set your environment variables:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">GRAPH_DATABASE_PROVIDER</span><span class="o">=</span><span class="s2">"arcadedb"</span>
<span class="nv">GRAPH_DATABASE_URL</span><span class="o">=</span><span class="s2">"bolt://localhost:7687"</span>
<span class="nv">GRAPH_DATABASE_USERNAME</span><span class="o">=</span><span class="s2">"root"</span>
<span class="nv">GRAPH_DATABASE_PASSWORD</span><span class="o">=</span><span class="s2">"arcadedb"</span>
</code></pre></div></div>

<p>Or configure programmatically:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">cognee</span>

<span class="n">cognee</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="nf">set_graph_database_provider</span><span class="p">(</span><span class="sh">"</span><span class="s">arcadedb</span><span class="sh">"</span><span class="p">)</span>
<span class="n">cognee</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="nf">set_graph_db_config</span><span class="p">({</span>
    <span class="sh">"</span><span class="s">graph_database_url</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">bolt://localhost:7687</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">graph_database_username</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">root</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">graph_database_password</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">arcadedb</span><span class="sh">"</span><span class="p">,</span>
<span class="p">})</span>
</code></pre></div></div>

<p>That’s it. From this point, every <code class="language-plaintext highlighter-rouge">cognee.add()</code>, <code class="language-plaintext highlighter-rouge">cognee.cognify()</code>, and <code class="language-plaintext highlighter-rouge">cognee.search()</code> call uses ArcadeDB as the graph backend.</p>

<h3 id="4-build-and-query-a-knowledge-graph">4. Build and query a knowledge graph</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">cognee</span>

<span class="c1"># Ingest some data
</span><span class="k">await</span> <span class="n">cognee</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="sh">"</span><span class="s">ArcadeDB is a multi-model database that supports graph, </span><span class="sh">"</span>
                 <span class="sh">"</span><span class="s">document, key/value, time series, and vector data models. </span><span class="sh">"</span>
                 <span class="sh">"</span><span class="s">It is open source under the Apache 2.0 license.</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Build the knowledge graph
</span><span class="k">await</span> <span class="n">cognee</span><span class="p">.</span><span class="nf">cognify</span><span class="p">()</span>

<span class="c1"># Search with combined graph + vector retrieval
</span><span class="n">results</span> <span class="o">=</span> <span class="k">await</span> <span class="n">cognee</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="sh">"</span><span class="s">What data models does ArcadeDB support?</span><span class="sh">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
</code></pre></div></div>

<p>Cognee extracts entities (ArcadeDB, Apache 2.0, graph, document, etc.), builds relationships between them, generates embeddings, and stores everything in ArcadeDB. When you search, Cognee traverses the graph <em>and</em> runs vector similarity — returning results that understand both semantic meaning and structural relationships.</p>

<hr />

<h2 id="the-multi-model-advantage">The Multi-Model Advantage</h2>

<p>Most AI memory systems treat graphs and vectors as separate concerns with separate databases. This creates real problems:</p>

<ul>
  <li><strong>Data synchronization.</strong> Entities in the graph must stay in sync with their vector representations. Two databases means two sources of truth.</li>
  <li><strong>Operational complexity.</strong> Two databases to deploy, scale, back up, and monitor. Two sets of connection pools, credentials, and failure modes.</li>
  <li><strong>Query-time overhead.</strong> A search that needs both graph context and vector similarity requires two round-trips to two different systems.</li>
</ul>

<p>ArcadeDB eliminates this split. A single node in ArcadeDB can be a graph vertex with edges to other entities <em>and</em> carry a vector embedding for similarity search <em>and</em> store document properties — all queryable in a single query. This is what multi-model means in practice: not just supporting multiple APIs, but storing and querying multiple data representations in a single, consistent engine.</p>

<p>For Cognee’s architecture specifically, this means the knowledge graph and the vector index live in the same database, on the same data. No synchronization layer. No eventual consistency between two systems. One transactional engine.</p>

<hr />

<h2 id="whats-next">What’s Next</h2>

<p>The ArcadeDB adapter for Cognee is available today as a <a href="https://github.com/topoteretes/cognee-community">community package</a>. We’re working with the Cognee team to:</p>

<ul>
  <li>Expand the integration to cover ArcadeDB’s vector search capabilities directly within the Cognee pipeline</li>
  <li>Optimize graph construction queries for ArcadeDB’s native traversal performance</li>
  <li>Make ArcadeDB a first-class backend option in Cognee’s documentation and getting started guides</li>
</ul>

<p>If you’re building AI agents that need structured, persistent memory — or if you’re looking for a single database to replace a graph DB + vector store combination for <a href="https://arcadedb.com/graph-rag.html">GraphRAG</a> — give ArcadeDB + Cognee a try.</p>

<p><strong>Get started:</strong></p>
<ul>
  <li><a href="https://docs.arcadedb.com">ArcadeDB documentation</a></li>
  <li><a href="https://docs.cognee.ai">Cognee documentation</a></li>
  <li><a href="https://github.com/topoteretes/cognee-community">ArcadeDB adapter source code</a></li>
  <li><a href="https://hub.docker.com/r/arcadedata/arcadedb">ArcadeDB Docker Hub</a></li>
</ul>]]></content><author><name>Luca Garulli</name></author><category term="Cognee" /><category term="AI" /><category term="Memory Engine" /><category term="Graph Database" /><category term="Multi-Model" /><category term="Knowledge Graph" /><category term="RAG" /><category term="Integration" /><category term="Python" /><summary type="html"><![CDATA[How Cognee's AI memory engine and ArcadeDB's multi-model database work together to give AI agents persistent, structured memory — with a single backend for graphs, documents, and vectors.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://arcadedb.com/assets/images/cognee-arcadedb.jpg" /><media:content medium="image" url="https://arcadedb.com/assets/images/cognee-arcadedb.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Declarative Graph Importer: Import StackOverflow into a Graph with a Single JSON File</title><link href="https://arcadedb.com/blog/declarative-graph-importer/" rel="alternate" type="text/html" title="Declarative Graph Importer: Import StackOverflow into a Graph with a Single JSON File" /><published>2026-03-28T00:00:00+00:00</published><updated>2026-03-28T00:00:00+00:00</updated><id>https://arcadedb.com/blog/declarative-graph-importer</id><content type="html" xml:base="https://arcadedb.com/blog/declarative-graph-importer/"><![CDATA[<p>Importing a real-world dataset into a graph database usually means writing a custom ETL script: parse the files, resolve foreign keys, batch your transactions, handle edge cases. It works, but it’s tedious, error-prone, and you end up throwing away the script once the import is done.</p>

<p><a href="https://arcadedb.com/blog/arcadedb-26-3-2/">ArcadeDB v26.3.2</a> introduces the <strong>GraphImporter</strong> — a declarative tool that turns CSV, XML, and JSONL files into a fully connected graph using nothing but a JSON configuration file. No code, no custom scripts. Under the hood it uses the <a href="https://arcadedb.com/blog/graphbatch-up-to-8x-faster-graph-ingestion/">GraphBatch engine</a> for maximum throughput.</p>

<p>Let’s see how it works by importing a real dataset: the <strong>StackOverflow data dump</strong>.</p>

<h2 id="the-stackoverflow-graph-model">The StackOverflow Graph Model</h2>

<p>The <a href="https://archive.org/details/stackexchange">StackOverflow data dump</a> is a classic dataset for benchmarking and graph analysis. It ships as a set of XML files, each representing a table in the original relational schema. Here’s the graph model we’ll build:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>            ASKED                    TAGGED_WITH
  User ──────────────&gt; Question ──────────────&gt; Tag
    │                   │    ^
    │ ANSWERED           │    │ HAS_ANSWER
    v                   │    │
  Answer &lt;──────────────┘    │
    ^                        │
    │ ACCEPTED_ANSWER        │
    └────────────────────────┘

  User ──WROTE_COMMENT──&gt; Comment ──COMMENTED_ON──&gt; Question/Answer
  User ──EARNED──&gt; Badge
  Question ──LINKED_TO──&gt; Question
</code></pre></div></div>

<p>Six vertex types, eight edge types, all derived from six XML files. Let’s see how to express this as a single JSON configuration.</p>

<h2 id="the-import-configuration">The Import Configuration</h2>

<p>Here’s the complete JSON file that defines the entire import:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"vertices"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Tag"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Tags.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"nameId"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TagName"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"TagName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TagName"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Count"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Count"</span><span class="w"> </span><span class="p">}</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"User"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Users.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"DisplayName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"DisplayName"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Reputation"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Reputation"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"CreationDate"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CreationDate"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Views"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Views"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"UpVotes"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:UpVotes"</span><span class="p">,</span><span class="w"> </span><span class="nl">"DownVotes"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:DownVotes"</span><span class="w">
      </span><span class="p">}</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Question"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Posts.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"filter"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostTypeId=1"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Title"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Body"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Body"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Score"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Score"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"ViewCount"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:ViewCount"</span><span class="p">,</span><span class="w"> </span><span class="nl">"CreationDate"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CreationDate"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"AnswerCount"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:AnswerCount"</span><span class="p">,</span><span class="w"> </span><span class="nl">"CommentCount"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:CommentCount"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Tags"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Tags"</span><span class="w">
      </span><span class="p">},</span><span class="w">
      </span><span class="nl">"edges"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"OwnerUserId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ASKED"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"User"</span><span class="p">,</span><span class="w"> </span><span class="nl">"direction"</span><span class="p">:</span><span class="w"> </span><span class="s2">"in"</span><span class="w"> </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Tags"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TAGGED_WITH"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Tag"</span><span class="p">,</span><span class="w"> </span><span class="nl">"split"</span><span class="p">:</span><span class="w"> </span><span class="s2">"|"</span><span class="w"> </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Answer"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Posts.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"filter"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostTypeId=2"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Body"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Body"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Score"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Score"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"CreationDate"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CreationDate"</span><span class="p">,</span><span class="w"> </span><span class="nl">"CommentCount"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:CommentCount"</span><span class="w">
      </span><span class="p">},</span><span class="w">
      </span><span class="nl">"edges"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"OwnerUserId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ANSWERED"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"User"</span><span class="p">,</span><span class="w"> </span><span class="nl">"direction"</span><span class="p">:</span><span class="w"> </span><span class="s2">"in"</span><span class="w"> </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ParentId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HAS_ANSWER"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Question"</span><span class="p">,</span><span class="w"> </span><span class="nl">"direction"</span><span class="p">:</span><span class="w"> </span><span class="s2">"in"</span><span class="w"> </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Comment"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Comments.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Score"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Score"</span><span class="p">,</span><span class="w"> </span><span class="nl">"CreationDate"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CreationDate"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Text"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Text"</span><span class="w">
      </span><span class="p">},</span><span class="w">
      </span><span class="nl">"edges"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"COMMENTED_ON"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Question"</span><span class="w"> </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"COMMENTED_ON_ANSWER"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Answer"</span><span class="w"> </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"UserId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"WROTE_COMMENT"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"User"</span><span class="p">,</span><span class="w"> </span><span class="nl">"direction"</span><span class="p">:</span><span class="w"> </span><span class="s2">"in"</span><span class="w"> </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Badge"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Badges.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"Id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Id"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Name"</span><span class="p">,</span><span class="w"> </span><span class="nl">"Date"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"BadgeClass"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:Class"</span><span class="p">,</span><span class="w"> </span><span class="nl">"TagBased"</span><span class="p">:</span><span class="w"> </span><span class="s2">"bool:TagBased"</span><span class="w">
      </span><span class="p">},</span><span class="w">
      </span><span class="nl">"edges"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"UserId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"EARNED"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"User"</span><span class="p">,</span><span class="w"> </span><span class="nl">"direction"</span><span class="p">:</span><span class="w"> </span><span class="s2">"in"</span><span class="w"> </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">],</span><span class="w">

  </span><span class="nl">"edgeSources"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ACCEPTED_ANSWER"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Posts.xml"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"from"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id:Question"</span><span class="p">,</span><span class="w"> </span><span class="nl">"to"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AcceptedAnswerId:Answer"</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"LINKED_TO"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostLinks.xml"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"from"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostId:Question"</span><span class="p">,</span><span class="w"> </span><span class="nl">"to"</span><span class="p">:</span><span class="w"> </span><span class="s2">"RelatedPostId:Question"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"LinkType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:LinkTypeId"</span><span class="w"> </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">],</span><span class="w">

  </span><span class="nl">"postImportCommands"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"language"</span><span class="p">:</span><span class="w"> </span><span class="s2">"sql"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CREATE GRAPH ANALYTICAL VIEW IF NOT EXISTS stackoverflow PROPERTIES (`!Body`, `!Text`) UPDATE MODE SYNCHRONOUS"</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>That’s it. Six vertex types, eight edge types, post-import Graph Analytical View — all in one file. Let’s break down the key patterns.</p>

<h2 id="key-patterns-explained">Key Patterns Explained</h2>

<h3 id="splitting-one-file-into-multiple-vertex-types">Splitting One File into Multiple Vertex Types</h3>

<p>StackOverflow stores both questions and answers in the same <code class="language-plaintext highlighter-rouge">Posts.xml</code> file, distinguished by <code class="language-plaintext highlighter-rouge">PostTypeId</code>. The <code class="language-plaintext highlighter-rouge">filter</code> option lets you import them as separate vertex types:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Question"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Posts.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"filter"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostTypeId=1"</span><span class="p">,</span><span class="w"> </span><span class="err">...</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Answer"</span><span class="p">,</span><span class="w">   </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Posts.xml"</span><span class="p">,</span><span class="w"> </span><span class="nl">"filter"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostTypeId=2"</span><span class="p">,</span><span class="w"> </span><span class="err">...</span><span class="w"> </span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The importer reads <code class="language-plaintext highlighter-rouge">Posts.xml</code> once per definition, but only creates vertices for rows matching the filter. This is a common pattern when a single source table contains multiple logical entity types.</p>

<h3 id="foreign-key-resolution">Foreign Key Resolution</h3>

<p>Most edges are derived from foreign key attributes in the source data. The importer needs to know two things: which attribute holds the foreign key, and which vertex type it references.</p>

<p><strong>Outgoing edges</strong> — the foreign key is in <em>this</em> vertex’s source, pointing to the target:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ParentId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HAS_ANSWER"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Question"</span><span class="p">,</span><span class="w"> </span><span class="nl">"direction"</span><span class="p">:</span><span class="w"> </span><span class="s2">"in"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This means: “read <code class="language-plaintext highlighter-rouge">ParentId</code> from each Answer row, find the Question with that ID, and create a <code class="language-plaintext highlighter-rouge">HAS_ANSWER</code> edge from the Question to this Answer”. The <code class="language-plaintext highlighter-rouge">"direction": "in"</code> flips the edge so the Question is the source (the question <em>has</em> an answer, not the other way around).</p>

<p><strong>Default direction is <code class="language-plaintext highlighter-rouge">"out"</code></strong> — the current vertex is the edge source:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostId"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"COMMENTED_ON"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Question"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This creates an edge from Comment to Question.</p>

<h3 id="split-field-edges-multi-value-attributes">Split-Field Edges (Multi-Value Attributes)</h3>

<p>StackOverflow stores tags as a single delimited string like <code class="language-plaintext highlighter-rouge">&lt;java&gt;&lt;python&gt;&lt;sql&gt;</code>. The <code class="language-plaintext highlighter-rouge">split</code> option expands this into multiple edges:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"attribute"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Tags"</span><span class="p">,</span><span class="w"> </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TAGGED_WITH"</span><span class="p">,</span><span class="w"> </span><span class="nl">"target"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Tag"</span><span class="p">,</span><span class="w"> </span><span class="nl">"split"</span><span class="p">:</span><span class="w"> </span><span class="s2">"|"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>For a question tagged <code class="language-plaintext highlighter-rouge">|java|python|sql|</code>, this creates three <code class="language-plaintext highlighter-rouge">TAGGED_WITH</code> edges — one to each Tag vertex. The split values are resolved using the target’s <code class="language-plaintext highlighter-rouge">nameId</code> attribute (in this case, <code class="language-plaintext highlighter-rouge">TagName</code>), not the integer <code class="language-plaintext highlighter-rouge">id</code>.</p>

<h3 id="edge-only-sources">Edge-Only Sources</h3>

<p>Some relationships live in their own source file rather than as foreign keys in a vertex file. The <code class="language-plaintext highlighter-rouge">edgeSources</code> section handles these:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"edge"</span><span class="p">:</span><span class="w"> </span><span class="s2">"LINKED_TO"</span><span class="p">,</span><span class="w"> </span><span class="nl">"file"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostLinks.xml"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"from"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PostId:Question"</span><span class="p">,</span><span class="w"> </span><span class="nl">"to"</span><span class="p">:</span><span class="w"> </span><span class="s2">"RelatedPostId:Question"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"LinkType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"int:LinkTypeId"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The compact <code class="language-plaintext highlighter-rouge">"attribute:vertexType"</code> syntax tells the importer which attribute to read and which vertex type to resolve against. Both endpoints must already exist (vertex sources are processed first).</p>

<h3 id="property-type-mapping">Property Type Mapping</h3>

<p>Properties are strings by default. Prefix the source attribute with a type hint for automatic conversion:</p>

<table>
  <thead>
    <tr>
      <th>Syntax</th>
      <th>Type</th>
      <th>Example</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"DisplayName"</code></td>
      <td>String</td>
      <td><code class="language-plaintext highlighter-rouge">"name": "DisplayName"</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"int:Score"</code></td>
      <td>Integer</td>
      <td><code class="language-plaintext highlighter-rouge">"score": "int:Score"</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">"bool:TagBased"</code></td>
      <td>Boolean</td>
      <td><code class="language-plaintext highlighter-rouge">"tagBased": "bool:TagBased"</code></td>
    </tr>
  </tbody>
</table>

<h3 id="post-import-commands">Post-Import Commands</h3>

<p>The <code class="language-plaintext highlighter-rouge">postImportCommands</code> array runs SQL (or any supported language) after the import completes. In this example, we create a <a href="https://arcadedb.com/blog/graph-olap-engine-the-fastest-graph-analytics-with-zero-compromises/">Graph Analytical View</a> that pre-computes the graph structure for fast OLAP queries, excluding large text properties (<code class="language-plaintext highlighter-rouge">Body</code>, <code class="language-plaintext highlighter-rouge">Text</code>) to keep the view compact:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"language"</span><span class="p">:</span><span class="w"> </span><span class="s2">"sql"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CREATE GRAPH ANALYTICAL VIEW IF NOT EXISTS stackoverflow PROPERTIES (`!Body`, `!Text`) UPDATE MODE SYNCHRONOUS"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<h2 id="running-the-import">Running the Import</h2>

<h3 id="from-the-command-line">From the Command Line</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java com.arcadedb.integration.importer.graph.GraphImporter <span class="se">\</span>
    stackoverflow-import.json <span class="se">\</span>
    /path/to/database <span class="se">\</span>
    /path/to/stackoverflow-data
</code></pre></div></div>

<p>The importer auto-creates the schema (vertex and edge types) from the JSON config, runs the two-pass import, and executes post-import commands. File paths in the JSON are resolved relative to the data directory (third argument).</p>

<h3 id="from-java">From Java</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Database</span> <span class="n">database</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">DatabaseFactory</span><span class="o">(</span><span class="s">"/path/to/database"</span><span class="o">).</span><span class="na">create</span><span class="o">();</span>

<span class="nc">String</span> <span class="n">json</span> <span class="o">=</span> <span class="nc">Files</span><span class="o">.</span><span class="na">readString</span><span class="o">(</span><span class="nc">Path</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"stackoverflow-import.json"</span><span class="o">));</span>
<span class="nc">GraphImporter</span><span class="o">.</span><span class="na">createSchemaFromConfig</span><span class="o">(</span><span class="n">database</span><span class="o">,</span> <span class="k">new</span> <span class="nc">JSONObject</span><span class="o">(</span><span class="n">json</span><span class="o">));</span>

<span class="k">try</span> <span class="o">(</span><span class="nc">GraphImporter</span> <span class="n">importer</span> <span class="o">=</span> <span class="nc">GraphImporter</span><span class="o">.</span><span class="na">fromJSON</span><span class="o">(</span><span class="n">database</span><span class="o">,</span> <span class="n">json</span><span class="o">,</span> <span class="s">"/path/to/data"</span><span class="o">))</span> <span class="o">{</span>
    <span class="n">importer</span><span class="o">.</span><span class="na">run</span><span class="o">();</span>
    <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">printf</span><span class="o">(</span><span class="s">"Vertices: %,d  Edges: %,d%n"</span><span class="o">,</span>
        <span class="n">importer</span><span class="o">.</span><span class="na">getVertexCount</span><span class="o">(),</span> <span class="n">importer</span><span class="o">.</span><span class="na">getEdgeCount</span><span class="o">());</span>
<span class="o">}</span>

<span class="nc">GraphImporter</span><span class="o">.</span><span class="na">executePostImportCommands</span><span class="o">(</span><span class="n">database</span><span class="o">,</span> <span class="k">new</span> <span class="nc">JSONObject</span><span class="o">(</span><span class="n">json</span><span class="o">));</span>
</code></pre></div></div>

<h3 id="programmatic-builder-api">Programmatic Builder API</h3>

<p>If you prefer code over JSON, the same import can be expressed with the builder:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">GraphImporter</span><span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="n">database</span><span class="o">)</span>
    <span class="o">.</span><span class="na">vertex</span><span class="o">(</span><span class="s">"Tag"</span><span class="o">,</span> <span class="nc">XmlRowSource</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="n">dataDir</span><span class="o">,</span> <span class="s">"Tags.xml"</span><span class="o">),</span> <span class="n">v</span> <span class="o">-&gt;</span> <span class="o">{</span>
        <span class="n">v</span><span class="o">.</span><span class="na">id</span><span class="o">(</span><span class="s">"Id"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">idByName</span><span class="o">(</span><span class="s">"TagName"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">property</span><span class="o">(</span><span class="s">"TagName"</span><span class="o">,</span> <span class="s">"TagName"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">intProperty</span><span class="o">(</span><span class="s">"Count"</span><span class="o">,</span> <span class="s">"Count"</span><span class="o">);</span>
    <span class="o">})</span>
    <span class="o">.</span><span class="na">vertex</span><span class="o">(</span><span class="s">"User"</span><span class="o">,</span> <span class="nc">XmlRowSource</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="n">dataDir</span><span class="o">,</span> <span class="s">"Users.xml"</span><span class="o">),</span> <span class="n">v</span> <span class="o">-&gt;</span> <span class="o">{</span>
        <span class="n">v</span><span class="o">.</span><span class="na">id</span><span class="o">(</span><span class="s">"Id"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">property</span><span class="o">(</span><span class="s">"DisplayName"</span><span class="o">,</span> <span class="s">"DisplayName"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">intProperty</span><span class="o">(</span><span class="s">"Reputation"</span><span class="o">,</span> <span class="s">"Reputation"</span><span class="o">);</span>
    <span class="o">})</span>
    <span class="o">.</span><span class="na">vertex</span><span class="o">(</span><span class="s">"Question"</span><span class="o">,</span> <span class="nc">XmlRowSource</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="n">dataDir</span><span class="o">,</span> <span class="s">"Posts.xml"</span><span class="o">),</span> <span class="n">v</span> <span class="o">-&gt;</span> <span class="o">{</span>
        <span class="n">v</span><span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="s">"PostTypeId"</span><span class="o">,</span> <span class="s">"1"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">id</span><span class="o">(</span><span class="s">"Id"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">property</span><span class="o">(</span><span class="s">"Title"</span><span class="o">,</span> <span class="s">"Title"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">intProperty</span><span class="o">(</span><span class="s">"Score"</span><span class="o">,</span> <span class="s">"Score"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">edgeIn</span><span class="o">(</span><span class="s">"OwnerUserId"</span><span class="o">,</span> <span class="s">"ASKED"</span><span class="o">,</span> <span class="s">"User"</span><span class="o">);</span>
        <span class="n">v</span><span class="o">.</span><span class="na">splitEdge</span><span class="o">(</span><span class="s">"Tags"</span><span class="o">,</span> <span class="s">"TAGGED_WITH"</span><span class="o">,</span> <span class="s">"Tag"</span><span class="o">,</span> <span class="s">"|"</span><span class="o">);</span>
    <span class="o">})</span>
    <span class="c1">// ... remaining vertex and edge sources</span>
    <span class="o">.</span><span class="na">build</span><span class="o">()</span>
    <span class="o">.</span><span class="na">run</span><span class="o">();</span>
</code></pre></div></div>

<h2 id="how-it-works-under-the-hood">How It Works Under the Hood</h2>

<p>The GraphImporter uses a <strong>two-pass, CSR-first</strong> (Compressed Sparse Row) architecture:</p>

<p><strong>Pass 1 — Vertices and topology collection.</strong> Each data source is read once. Vertices are created with full properties and flushed to disk immediately. Foreign key values are collected as compressed primitive arrays (int arrays for IDs, bucket/position pairs for RIDs) — no objects, no boxing, minimal GC pressure.</p>

<p><strong>Pass 2 — Edge creation.</strong> The collected topology is fed into <a href="https://arcadedb.com/blog/graphbatch-up-to-8x-faster-graph-ingestion/">GraphBatch</a>, which creates all edges with bidirectional traversal support. Each edge type is processed as a single batch for maximum sequential I/O.</p>

<p>This design means vertex data doesn’t stay in memory — only the graph topology does. For a dataset with 8 million vertices and 15 million edges, the in-memory topology is roughly <strong>300 MB</strong>.</p>

<h2 id="supported-data-sources">Supported Data Sources</h2>

<table>
  <thead>
    <tr>
      <th>Format</th>
      <th>Auto-detected</th>
      <th>Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>CSV</td>
      <td><code class="language-plaintext highlighter-rouge">.csv</code></td>
      <td>Configurable delimiter (<code class="language-plaintext highlighter-rouge">"delimiter": ","</code>) and skip lines (<code class="language-plaintext highlighter-rouge">"skipLines": 1</code>)</td>
    </tr>
    <tr>
      <td>JSONL</td>
      <td><code class="language-plaintext highlighter-rouge">.jsonl</code>, <code class="language-plaintext highlighter-rouge">.ndjson</code></td>
      <td>One JSON object per line</td>
    </tr>
    <tr>
      <td>XML</td>
      <td><code class="language-plaintext highlighter-rouge">.xml</code></td>
      <td>Attribute-based by default (StackOverflow-style <code class="language-plaintext highlighter-rouge">&lt;row .../&gt;</code>). Set <code class="language-plaintext highlighter-rouge">"element": "book"</code> for child-element parsing</td>
    </tr>
  </tbody>
</table>

<p>All sources are streamed — the importer never loads an entire file into memory.</p>

<h2 id="configuration-reference">Configuration Reference</h2>

<h3 id="vertex-source">Vertex Source</h3>

<table>
  <thead>
    <tr>
      <th>Field</th>
      <th>Required</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">type</code></td>
      <td>Yes</td>
      <td>ArcadeDB vertex type name (auto-created if missing)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">file</code></td>
      <td>Yes</td>
      <td>Source file path, relative to the data directory</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">id</code></td>
      <td>No</td>
      <td>Integer primary key attribute for edge resolution</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">nameId</code></td>
      <td>No</td>
      <td>String-based secondary key (for split-field edge resolution)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">filter</code></td>
      <td>No</td>
      <td>Row filter: <code class="language-plaintext highlighter-rouge">"attribute=value"</code> — only matching rows are imported</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">properties</code></td>
      <td>No</td>
      <td>Map of <code class="language-plaintext highlighter-rouge">"dbPropertyName": "SourceAttr"</code> (or <code class="language-plaintext highlighter-rouge">"int:Attr"</code>, <code class="language-plaintext highlighter-rouge">"bool:Attr"</code>)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">edges</code></td>
      <td>No</td>
      <td>Array of edge definitions derived from foreign keys in this source</td>
    </tr>
  </tbody>
</table>

<h3 id="edge-definition-inside-a-vertex-source">Edge Definition (inside a vertex source)</h3>

<table>
  <thead>
    <tr>
      <th>Field</th>
      <th>Required</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">attribute</code></td>
      <td>Yes</td>
      <td>Source attribute containing the foreign key value</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">edge</code></td>
      <td>Yes</td>
      <td>ArcadeDB edge type name (auto-created if missing)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">target</code></td>
      <td>Yes</td>
      <td>Target vertex type the foreign key references</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">direction</code></td>
      <td>No</td>
      <td><code class="language-plaintext highlighter-rouge">"out"</code> (default) or <code class="language-plaintext highlighter-rouge">"in"</code> — controls edge direction</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">split</code></td>
      <td>No</td>
      <td>Delimiter for multi-value fields (creates one edge per value)</td>
    </tr>
  </tbody>
</table>

<h3 id="edge-only-source">Edge-Only Source</h3>

<table>
  <thead>
    <tr>
      <th>Field</th>
      <th>Required</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">edge</code></td>
      <td>Yes</td>
      <td>ArcadeDB edge type name</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">file</code></td>
      <td>Yes</td>
      <td>Source file path</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">from</code></td>
      <td>Yes</td>
      <td><code class="language-plaintext highlighter-rouge">"attribute:vertexType"</code> — source vertex reference</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">to</code></td>
      <td>Yes</td>
      <td><code class="language-plaintext highlighter-rouge">"attribute:vertexType"</code> — target vertex reference</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">properties</code></td>
      <td>No</td>
      <td>Map of <code class="language-plaintext highlighter-rouge">"dbPropertyName": "int:SourceAttr"</code></td>
    </tr>
  </tbody>
</table>

<h2 id="get-started">Get Started</h2>

<p>The GraphImporter is available starting from <strong><a href="https://arcadedb.com/blog/arcadedb-26-3-2/">ArcadeDB v26.3.2</a></strong>. Download the <a href="https://archive.org/details/stackexchange">StackOverflow data dump</a>, grab the JSON config above, and you’ll have a fully connected graph in minutes.</p>

<p><strong>Download ArcadeDB v26.3.2</strong>: <a href="https://github.com/ArcadeData/arcadedb/releases">GitHub Releases</a></p>

<p>If you have questions or feedback, join us on <a href="https://discord.gg/arcadedb">Discord</a> or open an issue on <a href="https://github.com/ArcadeData/arcadedb/issues">GitHub</a>.</p>]]></content><author><name>Luca Garulli</name></author><category term="Graph Database" /><category term="Import" /><category term="ETL" /><category term="StackOverflow" /><summary type="html"><![CDATA[ArcadeDB's new declarative GraphImporter turns CSV, XML, and JSONL files into a fully connected graph database with a single JSON configuration. Built on the high-performance GraphBatch engine, it handles millions of vertices and edges with minimal memory usage. Walk through a complete StackOverflow data dump import as a practical example.]]></summary></entry></feed>