Fraud Detection & Prevention with ArcadeDB

Fraud Is a $12.5 Billion Problem — and Growing

The FTC reported $12.5 billion in consumer fraud losses in 2024 — a 25% year-over-year increase. The FBI's Internet Crime Complaint Center recorded $16.6 billion in losses in the same period. Businesses worldwide lose an average of 7.7% of annual revenue to fraud, with U.S. companies reporting nearly 10%.

Traditional rule-based fraud detection systems catch the obvious cases but miss the sophisticated ones. Fraudsters don't operate in isolation — they form rings, use synthetic identities assembled from real and fake data, and exploit temporal patterns that static rules cannot see. Detecting these schemes requires understanding relationships across millions of entities in real-time.

This is a graph problem. But it's also a vector problem, a time-series problem, and a text-matching problem. ArcadeDB is the only database that handles all four natively.

Five Models, One Fraud Engine

Graph Traversal: Detect fraud rings via multi-hop relationship analysis
Vector Similarity: Flag behavioral anomalies through embedding distance
Time Series: Spot velocity attacks, unusual timing, and transaction bursts
Full-Text Search: Resolve synthetic identities via fuzzy name/address matching
Documents: Store rich entity profiles with flexible schemas

What a Fraud Ring Looks Like

A relational database sees 5 rows. A graph database sees the ring. One traversal query, milliseconds.

Why Fraud Is a Graph Problem

Every fraud investigation starts with the same question: who is connected to whom, and how? Fraudsters share devices, phone numbers, IP addresses, physical addresses, email patterns, and beneficiary accounts. Each of these shared identifiers is a relationship — and relationships are what graph databases are built for.

In a graph model, these connections are first-class citizens. Traversing from a suspicious account to all accounts sharing a device, then to all accounts sharing a phone with those accounts, takes constant time per hop — regardless of database size. A 5-hop traversal that would bring a relational database to its knees executes in under 50ms in ArcadeDB.

Key graph patterns for fraud detection:

Fraud rings: Circular chains of accounts linked through shared identifiers
Fan-out: One entity (phone, device, address) connected to suspiciously many accounts
Fan-in: Many accounts funneling money into a single beneficiary
Bridge nodes: Entities connecting otherwise separate fraud clusters
Circular transfers: Money flowing A→B→C→A to obscure its origin (money laundering)

Fraud Ring Detection

Fraud rings are the most damaging form of organized financial crime. A single fraud ring can cost an institution millions before detection. The challenge is that each individual account may look legitimate in isolation — it's only when you trace the connections that the pattern emerges.

ArcadeDB's graph engine lets you discover these rings by traversing shared identifiers: same device fingerprint, same phone number, overlapping addresses, shared IP ranges. The query on the right finds all accounts connected through any shared identifier within a specified depth, then calculates the total financial exposure of each cluster.

Because the traversal happens at the database level, not in application code, you get:

Real-time results: Ring detection during transaction authorization, not in overnight batch jobs
Full explainability: The graph path shows exactly which shared identifiers connect the accounts — critical for regulatory reporting
Depth flexibility: Adjust traversal depth to balance thoroughness vs. speed (2 hops for real-time checks, 5+ for investigations)
Dynamic schema: Add new identifier types (crypto wallets, device fingerprints, biometric hashes) without schema migrations

Detect Fraud Rings via Shared Identifiers

Find all accounts connected to a flagged account through shared devices, phones, or addresses:

-- Find fraud ring from a flagged account
MATCH (flagged:Account {id: 'acct-A'})
      -[:USES_DEVICE|HAS_PHONE|
        HAS_ADDRESS*1..4]-
      (connected:Account)
WHERE connected <> flagged
RETURN DISTINCT connected.id,
       connected.name

Closer connections (fewer hops) indicate stronger risk. The investigation team can follow the exact path to understand how each account is linked.

Synthetic Identity Resolution

Find all accounts sharing the same SSN — a key indicator of synthetic identity fraud:

-- Find accounts sharing the same SSN
SELECT id, full_name, ssn
FROM Account
WHERE ssn = '123-45-6789'
ORDER BY id

Multiple accounts sharing the same SSN is a strong signal of synthetic identity fraud — where fraudsters reuse stolen SSNs to create multiple fake identities.

Synthetic Identity Fraud: Full-Text Search Meets Graph

Synthetic identity fraud is the fastest-growing type of financial crime. Fraudsters combine a real Social Security number (often stolen from a child, elderly person, or deceased individual) with a fabricated name and address to create a new identity that passes basic verification checks.

Catching synthetic identities requires entity resolution — the ability to determine that two records that look different may actually represent the same person or a deliberate variation. This is a text-matching problem that graph databases alone cannot solve.

ArcadeDB's built-in full-text search engine with fuzzy matching handles this natively:

Fuzzy name matching: Detect "Jonathan Smith" vs. "Jon Smithe" vs. "J. Smith" as likely the same person
Address normalization: Match "123 Main St Apt 4B" with "123 Main Street, #4B"
Cross-reference with graph: Once entities are resolved, link them in the graph and traverse to discover the broader fraud network
No external tooling: Neo4j requires external entity resolution services. AWS Neptune needs AWS Entity Resolution. ArcadeDB does it in-database.

The result: a workflow where suspicious matches are identified via full-text search, linked as relationships in the graph, and then traversed to uncover the full scope of the fraud operation — all within a single database.

Anti-Money Laundering: Follow the Money

Money laundering depends on obscuring the trail between the origin and destination of funds. The classic techniques — layering (moving money through multiple accounts), smurfing (structuring deposits below reporting thresholds), and circular transfers — all create patterns that are invisible in a relational database but obvious in a graph.

ArcadeDB's time-series engine adds a critical dimension that pure graph databases lack: temporal pattern analysis. Money laundering schemes have distinct time signatures — rapid sequences of transfers just below reporting limits, dormant accounts that suddenly activate, or precisely timed circular flows.

Circular flow detection: Graph cycle queries find money returning to its origin through chains of intermediaries
Structuring detection: Time-series bucketing reveals deposits deliberately split to stay below $10,000 thresholds
Velocity analysis: Time-series rate() functions detect sudden surges in transaction frequency
Dormant account activation: Time-series gap detection identifies accounts with no activity for months that suddenly process high volumes
Cross-border patterns: Graph traversal combined with document properties traces funds flowing through high-risk jurisdictions

Detect Circular Money Flows

Find circular transfer chains where money returns to its origin through intermediaries — a hallmark of money laundering:

-- Detect circular money flows
-- (A -> B -> C -> D -> E -> A)
MATCH (origin:Account {id: 'acct-A'})
      -[:TRANSFERRED_TO]->(b:Account)
      -[:TRANSFERRED_TO]->(c:Account)
      -[:TRANSFERRED_TO]->(d:Account)
      -[:TRANSFERRED_TO]->(e:Account)
      -[:TRANSFERRED_TO]->(origin)
RETURN origin.id AS origin,
       b.id AS hop1, c.id AS hop2,
       d.id AS hop3, e.id AS hop4

Detect Structuring (Smurfing)

Use time-series bucketing to find accounts making multiple deposits just below the $10,000 reporting threshold:

-- Structuring detection:
-- accounts with 3+ deposits $8K-$9.9K
SELECT FROM (
  SELECT account_id,
         count(*) AS deposit_count
  FROM Deposit
  WHERE amount BETWEEN 8000 AND 9999
  GROUP BY account_id
) WHERE deposit_count >= 3

Behavioral Anomaly Detection

Compare a new transaction's behavioral embedding against the customer's established profile to flag anomalies in real-time:

-- Compare transaction behavior embedding
-- against customer profile
SELECT id, amount, merchant, account_id,
       vectorCosineSimilarity(
         behavior_embedding,
         [0.1, 0.2, 0.3, 0.4,
          0.5, 0.6, 0.7, 0.8]
       ) AS profile_similarity
FROM Transaction
WHERE account_id = 'acct-H'
ORDER BY profile_similarity

Behavioral embeddings encode transaction amount, time of day, merchant category, location, and device — capturing patterns that no static rule can express.

Vector Embeddings: Beyond Rules-Based Detection

Rules-based fraud detection ("flag transactions over $10,000") catches unsophisticated fraud but generates massive numbers of false positives and misses novel patterns entirely. Sophisticated fraudsters know the rules and operate just below them.

Behavioral embeddings offer a fundamentally different approach. By encoding a customer's transaction history — amounts, timing, merchant types, locations, device fingerprints — as a high-dimensional vector, you create a behavioral fingerprint that represents what "normal" looks like for each individual customer.

New transactions are encoded the same way and compared against the customer's profile vector. A large distance means the transaction deviates significantly from established patterns — even if it follows every rule perfectly.

No predefined rules: The system learns what's normal for each customer individually
Novel fraud detection: Catches patterns the fraud team hasn't seen before
Fewer false positives: A $5,000 transaction is anomalous for some customers, routine for others
Continuous learning: Update profile embeddings as customer behavior evolves

ArcadeDB stores these embeddings directly on customer vertices using JVector, enabling sub-millisecond distance calculations as part of graph traversal queries. No external vector database required.

Time Series: Catching What Static Analysis Misses

Fraud has a temporal dimension that pure graph analysis misses. A graph can tell you that two accounts share a device. But it can't tell you that both accounts were created within 30 seconds of each other at 3am, followed by a burst of 47 micro-transactions in 5 minutes — a pattern screaming "velocity attack."

ArcadeDB's native time-series engine stores transactional events with nanosecond precision and provides built-in analytical functions designed for exactly these patterns:

Velocity detection: The rate() function computes transaction frequency per second — a spike from 2/hour to 50/minute triggers an alert
Time bucketing: Aggregate transactions by hour, day, or week to reveal patterns invisible at the individual transaction level
Moving averages: Detect gradual behavior changes (a slowly increasing transaction ceiling) that would fool static thresholds
Gap detection: Identify dormant accounts that suddenly activate — a common money mule indicator
Correlation analysis: The correlate() function detects whether two accounts' transaction patterns move in lockstep, suggesting coordination

These functions run at the database level, not in application code, making them available for both real-time authorization checks and batch investigation queries.

Velocity Attack Detection

Detect accounts with sudden spikes in transaction frequency using time-series rate analysis:

-- Detect velocity attacks:
-- accounts with 5+ txns in 5 minutes
SELECT FROM (
  SELECT account_id,
         count(*) AS txn_count,
         min(ts) AS first_txn,
         max(ts) AS last_txn
  FROM Transaction
  WHERE ts BETWEEN
    '2026-03-01T13:00:00Z' AND
    '2026-03-01T13:05:00Z'
  GROUP BY account_id
) WHERE txn_count > 5

Flags accounts with more than 5 transactions in a 5-minute window — catching card testing attacks, account takeovers, and automated fraud bots.

Correlated Account Activity

-- Compare transaction patterns
-- across suspect accounts
SELECT account_id,
       avg(amount) AS avg_amount,
       count(*) AS txn_count
FROM Transaction
WHERE account_id IN ['acct-A', 'acct-B']
  AND ts >= '2026-02-01T00:00:00Z'
GROUP BY account_id

Comparing average amounts and transaction counts across suspect accounts reveals coordinated activity patterns.

Cross-Type Investigation Query

Use SQL subqueries to cross-reference accounts with suspicious behavioral flags:

-- Cross-type investigation:
-- find accounts with suspicious
-- recent behavior
SELECT id, name
FROM Account
WHERE id IN (
  SELECT id FROM Customer
  WHERE recent_behavior
    IN ['suspicious', 'anomalous']
)

Putting It All Together: Multi-Model Investigation

Real fraud investigations don't use one technique — they combine every available signal. When an account is flagged, the investigation team needs to:

Map the network (graph): Find all accounts connected through shared identifiers
Check velocity (time series): Are any linked accounts showing abnormal transaction patterns?
Detect anomalies (vectors): Are any linked accounts behaving differently from their established profiles?
Resolve identities (full-text): Are any linked accounts using name/address variations suggesting synthetic identities?
Score and rank (composite): Produce a unified risk score blending all signals

In a typical architecture, this requires orchestrating queries across 4-5 separate databases, merging results in application code, and accepting stale data from sync delays. With ArcadeDB, it's a single query that returns a composite risk score in real-time.

The query on the left demonstrates this: it starts with a graph traversal to find linked accounts, then enriches each account with time-series velocity data, vector anomaly scores, and produces a weighted risk score — all in one round-trip to the database.

Fraud Types ArcadeDB Detects

Transaction Fraud Rings

Groups of accounts collaborating on fraudulent transactions, connected through shared devices, phone numbers, or IP addresses. Graph traversal reveals the ring structure; time-series detects coordinated timing.

Synthetic Identity Fraud

Fabricated identities built from real and fake data. Full-text fuzzy matching detects name/address variations; graph analysis links synthetic identities that share underlying real data elements.

Money Laundering (AML)

Complex layering schemes using circular transfers, structuring, and shell accounts. Graph cycle detection finds circular flows; time-series bucketing catches structuring patterns.

Account Takeover

Compromised credentials used to access legitimate accounts. Vector embeddings detect behavioral deviations (different location, device, transaction patterns) from the account owner's baseline.

Insurance Claims Fraud

Staged accidents, duplicate claims, and provider collusion. Graph analysis reveals overlapping participants across claims; document storage holds unstructured evidence (photos, reports, notes).

First-Party Fraud

Individuals or groups deliberately defaulting on credit. Graph community detection identifies clusters of accounts with shared characteristics and simultaneous defaults.

Which Model Catches What

Fraud Type	Graph	Vector	Time Series	Full-Text
Fraud Rings	✓		✓
Synthetic Identity	✓			✓
Money Laundering	✓		✓
Account Takeover	✓	✓	✓
Insurance Fraud	✓		✓	✓
First-Party Fraud	✓	✓

Every fraud type benefits from multiple data models. Only a multi-model database can combine all four in a single query without data movement between systems.

Platform Comparison

Capability	Typical Stack	ArcadeDB
Graph traversal	Neo4j	Built-in
Vector anomaly detection	Pinecone / Weaviate	Built-in
Time-series analytics	TimescaleDB / InfluxDB	Built-in
Full-text entity resolution	Elasticsearch	Built-in
Document storage	MongoDB	Built-in
Data synchronization	Kafka / CDC	Not needed
License	Mixed / proprietary	Apache 2.0

Why ArcadeDB for Fraud Detection

Building a production fraud detection platform typically requires assembling a constellation of specialized databases: a graph database for relationship analysis, a vector store for anomaly detection, a time-series database for transaction patterns, a search engine for entity resolution, and a document store for case data. Add data synchronization pipelines and you're managing 6+ systems.

ArcadeDB replaces all of this with a single, unified engine:

Zero data movement: No ETL pipelines, no sync delays, no consistency gaps between systems
Real-time scoring: All five models queried in a single round-trip, fast enough for inline transaction authorization
Complete audit trail: Every data point, every relationship, every time-series event in one place for regulatory compliance
Simpler operations: One database to secure, backup, monitor, and scale
Lower cost: One system instead of six, with no commercial license fees

Apache 2.0 — Forever

Fraud detection systems are critical infrastructure. You need to trust that the database beneath them won't change its licensing terms, restrict your use case, or demand commercial fees under threat of non-compliance. ArcadeDB is Apache 2.0 forever — no bait-and-switch, no source-available restrictions, no per-node pricing surprises. Deploy it on-premise in an air-gapped environment, embed it in a commercial product, or run it in your own cloud — the license will never be used against you.

Client Success Story

"We migrated from a traditional RDBMS to ArcadeDB for our fraud detection platform. The results were immediate — we reduced false positives by 68% while catching 35% more actual fraud cases. The ability to query 4-5 hops in the relationship graph in under 50ms was game-changing for our real-time detection engine."

— Senior Engineering Manager, Global Payment Processing Company
(Company name withheld due to NDA)

Results Achieved:

68% reduction in false positive alerts
35% increase in fraud detection accuracy
Sub-50ms query response times for 5-hop traversals
$2.4M saved annually in fraud losses

Industries

Banking & Payments: Transaction fraud rings, card testing attacks, unauthorized access
Insurance: Claims fraud, provider collusion, staged incidents
E-commerce: Promo abuse, return fraud, fake reviews, account takeover
Telecommunications: Subscription fraud, SIM swap attacks, toll fraud
Government: Tax fraud, benefits fraud, identity theft detection
Cryptocurrency: Wallet clustering, mixer detection, wash trading

Fraud Detection & Prevention