ArcadeDB 26.5.1: Sparse Vector Index, Hybrid Retrieval & INT8 End-to-End

We’re excited to announce ArcadeDB 26.5.1, a major release with 270+ commits resolving 128 issues. The headline feature is a brand-new sparse vector index with server-side hybrid retrieval and INT8 quantization end-to-end, alongside extensive OpenCypher correctness improvements and query partitioning.

Major New Features

Sparse Vector Index & Hybrid Retrieval

The new LSM_SPARSE_VECTOR index type enables sparse-embedding retrieval (BM25/SPLADE-style) directly inside ArcadeDB.

Dense vectors capture semantic meaning across every dimension; sparse vectors capture exact keyword signals across a much larger vocabulary, with most positions empty. ArcadeDB 26.5.1 supports both, and can fuse them server-side.

Highlights:

vector.fuse(...) performs server-side result fusion using RRF, DBSF, and LINEAR strategies, so dense + sparse + lexical scores can be combined without round-trips to the client.
vector.neighbors(...) supports groupBy / groupSize options for diversified retrieval with nested-field grouping.
WAND / BlockMax-WAND dynamic pruning scales sparse retrieval to 100M+ documents.
Sparse-vector partitioning allows sharding by tenant or domain.
New reranker SQL functions enable two-stage retrieval pipelines.

INT8 Quantization for Dense Vectors

End-to-end INT8 support throughout the dense vector pipeline, dramatically reducing disk and RSS by avoiding the FP32 path entirely. A shared 8-bit representation now flows across ingest, storage, and query.

EXTERNAL Property Storage

A new paired-bucket layout isolates heavy property values (vectors, large strings, JSON) to separate external buckets while keeping the hot row data compact. The result: significantly cheaper scans on wide records.

EXTERNAL Property Storage moves heavy values (vectors, large strings, JSON) to a paired external bucket. The main bucket stays compact, scans stay hot, and large payloads are loaded lazily only when the row is actually read.

Query Partitioning

A partition-aware planner now prunes unnecessary partitions from SQL and Cypher execution plans, with integrity safeguards for partitioned types.

High Availability: Offline Cluster Bootstrap

Fresh HA clusters can now initialize from pre-seeded databases via snapshot-and-restore, eliminating the need for full dataset re-replication when expanding or rebuilding a cluster.

Production-Ready Helm Chart

The Helm chart has been reworked to align with the Raft-based HA subsystem introduced in 26.4.2, and is now suitable for production deployments.

Cypher Administrative Commands

Standard administrative commands SHOW INDEXES and SHOW CONSTRAINTS are now supported in OpenCypher.

SQL: FIND REFERENCES

The OrientDB-compatible FIND REFERENCES command is back, making it easy to locate all records pointing to a given RID — particularly useful for migrations from OrientDB.

C# End-to-End Testing

A new C# test suite validates ArcadeDB over the PostgreSQL wire protocol via Npgsql and Testcontainers on every build.

Studio Enhancements

Full-screen graph view mode
Clear query button / textbox
Session reset on token expiration
Persistent error message display
Query history no longer auto-submits
Inherited indexes now visible
HA cluster peer add / remove controls
Human-readable peer names in HA_SERVER_LIST

Major Fixes

OpenCypher Correctness

This release lands an extensive batch of OpenCypher fixes across pattern matching, write clauses, subqueries, and temporal expressions. Among the highlights:

valueType(...) now reports the NOT NULL suffix for non-null values.
point(...) WGS-84-3D exposes .height as a .z alias.
CALL ... YIELD preserves carried WITH variables.
Variable-length patterns no longer re-traverse previously bound relationships.
MERGE with an unbound label-only endpoint creates fresh nodes appropriately.
SET correctly propagates across all aliases for the same node.
Self-referential property updates remain idempotent across row fanout.
Temporal component access on date/datetime values now works correctly.
EXISTS { ... } subqueries correctly evaluate outer-variable expressions.
MATCH immediately after CREATE now sees newly created labeled nodes.
MERGE ... ON MATCH SET returns post-update property values.
MATCH on parent edge types matches sub-typed edges (polymorphic traversal).
shortestPath / allShortestPaths with variable-length alternation match correctly.
WHERE false literal predicates are no longer ignored.

…plus dozens more. See the full release notes for the complete list.

SQL

CONTAINSALL compares lists of Identifiables against RID strings correctly.
Correlated COLLECT { ... } / COUNT { ... } subqueries evaluate with outer-variable access.
SEARCH_INDEX and SEARCH_FIELDS propagate return values in filters and handle wildcards properly.
SELECT with a non-unique LSM index returns rows after partial deletes.
Edge creation with CONTENT no longer ignores properties.
algo.dijkstra yields correct weight calculations.
UPDATE EDGE SET @in / @out correctly rewires vertex edge lists.
point.withinBBox(...) supports cross-meridian bounding boxes.

Storage, Indexing & Schema

HASH index lookups return rows with data encryption enabled.
Orphan TypeIndex wrappers are dropped when the last bucket child is removed.
Subclass indexes are no longer incorrectly related to superclass indexes.
Manual index names are respected on creation.
Inherited indexes are now visible in Studio.

High Availability

Schema changes replicate to followers, closing WAL gaps.
Cluster inconsistency reports after node shutdowns resolved.
Massive inserts via gRPC replicate correctly.
/api/v1/batch no longer fails on followers with “Error on updating dictionary”.
/batch endpoint eliminates HTTP 500 NPE after successful commits.
e2e-ha integration tests stabilized with on-demand Toxiproxy support.

Wire Protocols

PostgreSQL

Empty SELECT results include RowDescription schema.
SHOW server_version returns a proper value for SQLAlchemy.
Cypher WHERE id(n) IN $array round-trips correctly.
Binary array deserialization implemented for JDBC setArray.
Named and positional parameters now work via Npgsql (C#).

Bolt

EXPLAIN / PROFILE plans are included in PULL SUCCESS metadata.
Executor recognizes the new sparse vector type.

gRPC

InsertStream throughput stays consistent after extended executeQuery calls.
Commit-time constraint violations surface as stream-level errors.
DATE columns no longer corrupted via parameter binding.
ARRAY_OF_LONGS and DATETIME preserve precision in parameter binding.

HTTP

INT8 query vectors routed via $bytes / $int8 markers.
RemoteGraphBatch honors unique edge constraints.
Edge DATETIME parser accepts ISO suffixes.

Dependencies

Notable upgrades include Netty 4.2.13.Final, Undertow 2.4.0.Final, PostgreSQL JDBC 42.7.11, Neo4j Java Driver 6.1.0, Jackson Databind 2.21.3, GraalVM 25.0.3, Testcontainers 2.0.5, plus Studio frontend improvements and security updates across the dependency stack.

Getting Started with 26.5.1

Docker

docker pull arcadedata/arcadedb:26.5.1

Visit our Docker Hub repository for more information.

Maven

<dependency>
    <groupId>com.arcadedb</groupId>
    <artifactId>arcadedb-engine</artifactId>
    <version>26.5.1</version>
</dependency>

All artifacts are available on Maven Central.

Documentation

For detailed information on features and usage, refer to our comprehensive documentation.

Compatibility Note

This release maintains 100% compatibility with previous database formats, meaning no export/import is required when upgrading. As always, we recommend creating a database backup before upgrading.

Download ArcadeDB 26.5.1 now: GitHub Releases

Thanks to everyone in the community who reported issues, opened PRs, and helped shape this release.

Luca Garulli ArcadeDB Founder