Recommendation Engines with ArcadeDB

Why Graphs Are the Natural Model for Recommendations

Every recommendation is fundamentally a relationship problem. "Users who bought X also bought Y" is a two-hop traversal. "Friends of friends who liked this movie" is three hops. "Products similar to what you viewed last week, filtered by your social circle's preferences" crosses multiple dimensions at once.

Relational databases model these patterns with expensive recursive JOINs that degrade exponentially with depth. Document stores lose relationship context entirely. Graph databases make connections first-class citizens, so traversing millions of relationships takes milliseconds, not minutes.

But a graph alone is not enough. Modern recommendation engines need three signal types working together:

Structural signals — who is connected to whom, what was purchased together, which categories overlap (graph traversal)
Semantic signals — content similarity, user preference embeddings, visual/textual features (vector search)
Temporal signals — what was viewed recently, trending items, seasonal patterns, engagement decay (time series)

Most architectures scatter these across three separate systems. ArcadeDB unifies them in one engine.

Three Models, One Query

Collaborative Filtering via Graph Traversal

Find what similar users liked by traversing the interaction graph. This Cypher query walks two hops — from user to purchased products, then to other users who bought the same items, then to their other purchases — ranking results by how many independent paths lead to each recommendation:

MATCH (me:User {id: 'u1'})
      -[:PURCHASED]->(p:Product)
      <-[:PURCHASED]-(other:User)
      -[:PURCHASED]->(rec:Product)
WHERE rec <> p
  AND NOT (me)-[:PURCHASED]->(rec)
RETURN rec.name, rec.category,
       count(DISTINCT other) AS score
ORDER BY score DESC
LIMIT 20

More distinct paths = stronger collaborative signal. This pattern scales to millions of users because ArcadeDB resolves each hop in constant time per relationship, not via index lookups or JOINs.

Collaborative Filtering: "Users Like You"

Collaborative filtering is the backbone of recommendation systems. The core insight is simple: if two users have overlapping purchase or rating histories, items one has discovered but the other hasn't are strong candidates.

In a graph database, this is a natural traversal pattern. There's no matrix factorization, no offline batch job, no separate ML pipeline required for the base case. The graph is the recommendation engine.

Key advantages of graph-based collaborative filtering:

Real-time updates: A new purchase is instantly available as a signal — no waiting for nightly batch recomputation
Explainable results: The graph path shows exactly which shared purchases drove the recommendation
Social amplification: Weight recommendations higher when the connecting users are also friends or in the same community
Depth control: Tune recommendation diversity by adjusting traversal depth (2 hops = close matches, 3+ hops = serendipitous discovery)

Content-Based Filtering: Vectors for Semantic Similarity

Collaborative filtering fails when a user is new (no interaction history) or an item is new (no one has rated it yet). This is the cold-start problem, and it's where vector embeddings shine.

By storing embedding vectors directly on product and user nodes, ArcadeDB can find items that are semantically similar based on their features — text descriptions, images, categories, pricing — even when no one has interacted with them yet.

ArcadeDB uses JVector, a state-of-the-art vector search engine, to perform approximate nearest-neighbor (ANN) searches directly within the database. No external vector store required.

Product embeddings: Represent product features (text, images, attributes) as high-dimensional vectors
User preference vectors: Aggregate a user's interaction history into a preference embedding
Cross-modal similarity: Compare text descriptions to image embeddings for richer matches
Cold-start solved: New products get recommended immediately based on content similarity, without needing any interaction history

Vector Similarity Search

Find products similar to what a user just viewed, using vector embeddings stored directly on graph vertices:

SELECT name, category, price
FROM Product
WHERE inStock = true
ORDER BY vectorNeighbors(
  'Product[embedding]',
  [0.9, 0.1, 0.1, 0.1], 20) DESC
LIMIT 20

Time-Aware Recommendations

Use ArcadeDB's native time-series engine to detect trending products and seasonal patterns, then feed those signals directly into your recommendation scoring:

-- Trending products by total interactions
SELECT productId,
  sum(purchaseCount) AS totalInteractions
FROM ProductInteraction
GROUP BY productId
ORDER BY totalInteractions DESC
LIMIT 10

Simple SQL aggregation over interaction records surfaces the most popular products, letting you blend trending signals directly into recommendation scoring.

Time Series: The Missing Dimension

Most recommendation systems treat time as an afterthought — a filter or a decay factor bolted on at the end. But temporal patterns are some of the strongest signals available:

Recency weighting: A product viewed 5 minutes ago is far more relevant than one viewed 3 months ago
Trend detection: Identify products with accelerating engagement before they go viral
Seasonal patterns: Recommend winter coats in October, swimwear in May, without hard-coded rules
Session context: Track what a user has browsed in this session to infer real-time intent
Engagement decay: Automatically down-weight recommendations from stale interaction patterns

ArcadeDB's native time-series engine stores interaction events with nanosecond precision and provides built-in functions for rate computation, time bucketing, moving averages, and trend detection — directly queryable alongside graph traversals and vector searches.

No external streaming platform. No batch pipeline. Temporal intelligence is computed at query time, always reflecting the latest data.

The Multi-Model Advantage: All Three in One Query

The real power of ArcadeDB is not that it supports graph, vectors, and time series individually — it's that you can combine all three in a single query, without data movement, without microservices, without synchronization headaches.

Consider a streaming platform recommending what to watch next. A traditional architecture requires:

A graph database for collaborative filtering ("viewers like you watched...")
A vector database for content similarity ("shows with a similar vibe...")
A time-series database for trending content and session context
An application layer to merge and re-rank results from all three
A caching layer to make it fast enough for real-time serving

With ArcadeDB, this entire pipeline collapses into a single database query. The result is lower latency, simpler architecture, and fresher recommendations.

Architecture Comparison

Multi-Model Query: Streaming Platform

These three steps combine collaborative filtering (graph), content similarity (vectors), and trending signals (time series) to produce ranked recommendations:

-- Step 1: Graph — collaborative filtering
MATCH (me:User {id: 'u1'})
  -[:PURCHASED]->(p:Product)
  <-[:PURCHASED]-(other:User)
  -[:PURCHASED]->(rec:Product)
WHERE rec <> p
  AND NOT (me)-[:PURCHASED]->(rec)
RETURN DISTINCT rec.name AS name

-- Step 2: Vector — rank by preference
SELECT name, category, price
FROM Product
WHERE name IN ['Running Shoes',
  'Water Bottle', 'Yoga Mat',
  'Tennis Racket']
ORDER BY vectorNeighbors(
  'Product[embedding]',
  [0.9, 0.1, 0.1, 0.1], 10) DESC

-- Step 3: Time-series — trending boost
SELECT productId,
  sum(purchaseCount) AS trending_score
FROM ProductInteraction
WHERE productId IN ['Running Shoes',
  'Water Bottle', 'Yoga Mat',
  'Tennis Racket']
GROUP BY productId
ORDER BY trending_score DESC

Real-World Scenario: Streaming Platform

Imagine a streaming platform recommending what to watch next. The ideal recommendation blends three signals:

Collaborative: "Viewers who watched the same 3 shows as you also loved this documentary" (graph traversal, weighted by overlap count)
Content-based: "This series has a similar tone, cast, and narrative style to your recent favorites" (vector similarity on show embeddings)
Trending: "This title's engagement rate surged 5x in the last 2 hours" (time-series rate computation)

The query on the left does exactly this. It starts with a graph traversal for collaborative candidates, then enriches each candidate with a vector similarity score and a real-time trending score from the time-series engine. A weighted formula produces the final ranking.

The result: recommendations that are personally relevant (collaborative), contextually appropriate (content similarity), and timely (trending) — computed in a single round-trip to the database.

Common Recommendation Patterns

ArcadeDB supports all major recommendation strategies natively, without external tooling:

Social Filtering

Leverage trust networks by weighting recommendations from friends and friends-of-friends more heavily. A 2-hop social traversal naturally surfaces items endorsed by people the user actually trusts.

Session-Based Recommendations

For anonymous or first-time users, track the current browsing session as time-series events and compute real-time intent. "You've looked at 3 running shoes in the last 4 minutes" is a powerful signal even without login history.

Frequently Bought Together

Co-purchase analysis is a simple graph pattern: count distinct users who purchased both items. Time-series bucketing adds seasonal awareness — recommend sunscreen with swimsuits in summer, but hot chocolate with blankets in winter.

Content Discovery & Serendipity

Deeper graph traversals (3-4 hops) combined with vector distance thresholds let you balance relevance with discovery. Close matches keep users engaged; distant-but-interesting suggestions prevent filter bubbles.

E-Commerce: Personalized Category Page

Combine session intent with purchase history and trending items to personalize a category page in real-time:

SELECT name, category, price
FROM Product
WHERE category = 'Electronics'
  AND inStock = true
ORDER BY vectorNeighbors(
  'Product[embedding]',
  [0.9, 0.1, 0.1, 0.1], 30) DESC
LIMIT 30

Operational Simplicity

Capability	Typical Stack	ArcadeDB
Graph traversal	Neo4j	Built-in
Vector search	Pinecone / Weaviate	Built-in
Time-series analytics	TimescaleDB / InfluxDB	Built-in
Document storage	MongoDB	Built-in
Data synchronization	Kafka / CDC pipelines	Not needed
Query languages	4+ different APIs	SQL, Cypher, Gremlin

Why ArcadeDB for Recommendations

Building a production recommendation engine typically requires assembling and maintaining a complex stack: a graph database for collaborative filtering, a vector database for content similarity, a time-series store for engagement signals, and an application layer to merge results.

ArcadeDB replaces this entire stack with a single database that natively supports all three data models. The benefits compound:

Zero data movement: No ETL pipelines, no synchronization lag, no consistency issues between stores
Lower latency: One database round-trip instead of three parallel queries + merge
Simpler operations: One system to backup, monitor, scale, and secure
Fresher results: Every interaction is immediately available across all three models
Lower cost: One database license (Apache 2.0, free forever) instead of three commercial services

And because ArcadeDB is Apache 2.0 forever, you can embed it in your product, offer it as a service, or modify it for your needs — with no license restrictions.

Client Success Story

"Our previous recommendation system required a complex microservices architecture with separate graph and vector databases, plus a caching layer. With ArcadeDB's multi-model approach, we consolidated everything into one database, reduced infrastructure costs by 60%, and actually improved recommendation quality. The cold start problem was nearly eliminated thanks to the hybrid approach."

— VP of Engineering, Leading E-commerce Platform
(Company name confidential per NDA)

Measurable Results:

42% increase in click-through rate
28% boost in conversion rate
60% reduction in infrastructure costs
Response times under 20ms at 10M+ users

Industries Using Graph Recommendations

E-commerce: Product recommendations, frequently bought together, personalized search results
Streaming & Media: Content discovery, "watch next", personalized playlists
Social Platforms: People you may know, content feed ranking, group suggestions
Travel & Hospitality: Destination recommendations, dynamic pricing, itinerary suggestions
Financial Services: Investment suggestions, cross-sell opportunities, risk-matched products
Healthcare: Treatment recommendations, clinical trial matching, drug interaction awareness

Intelligent Recommendation Engines