Fraud Detection & Prevention

Uncover fraud rings, synthetic identities, and money laundering in real-time — with graph traversal, vector anomaly detection, time-series patterns, and full-text entity resolution in a single database.

Fraud Is a $12.5 Billion Problem — and Growing

The FTC reported $12.5 billion in consumer fraud losses in 2024 — a 25% year-over-year increase. The FBI's Internet Crime Complaint Center recorded $16.6 billion in losses in the same period. Businesses worldwide lose an average of 7.7% of annual revenue to fraud, with U.S. companies reporting nearly 10%.

Traditional rule-based fraud detection systems catch the obvious cases but miss the sophisticated ones. Fraudsters don't operate in isolation — they form rings, use synthetic identities assembled from real and fake data, and exploit temporal patterns that static rules cannot see. Detecting these schemes requires understanding relationships across millions of entities in real-time.

This is a graph problem. But it's also a vector problem, a time-series problem, and a text-matching problem. ArcadeDB is the only database that handles all four natively.

Five Models, One Fraud Engine

  • Graph Traversal: Detect fraud rings via multi-hop relationship analysis
  • Vector Similarity: Flag behavioral anomalies through embedding distance
  • Time Series: Spot velocity attacks, unusual timing, and transaction bursts
  • Full-Text Search: Resolve synthetic identities via fuzzy name/address matching
  • Documents: Store rich entity profiles with flexible schemas

What a Fraud Ring Looks Like

same phone same device same address same email beneficiary of Account A Account B Account C Account D Account E Circular ring detected in 5 hops

A relational database sees 5 rows. A graph database sees the ring. One traversal query, milliseconds.

Why Fraud Is a Graph Problem

Every fraud investigation starts with the same question: who is connected to whom, and how? Fraudsters share devices, phone numbers, IP addresses, physical addresses, email patterns, and beneficiary accounts. Each of these shared identifiers is a relationship — and relationships are what graph databases are built for.

In a graph model, these connections are first-class citizens. Traversing from a suspicious account to all accounts sharing a device, then to all accounts sharing a phone with those accounts, takes constant time per hop — regardless of database size. A 5-hop traversal that would bring a relational database to its knees executes in under 50ms in ArcadeDB.

Key graph patterns for fraud detection:

  • Fraud rings: Circular chains of accounts linked through shared identifiers
  • Fan-out: One entity (phone, device, address) connected to suspiciously many accounts
  • Fan-in: Many accounts funneling money into a single beneficiary
  • Bridge nodes: Entities connecting otherwise separate fraud clusters
  • Circular transfers: Money flowing A→B→C→A to obscure its origin (money laundering)

Fraud Ring Detection

Fraud rings are the most damaging form of organized financial crime. A single fraud ring can cost an institution millions before detection. The challenge is that each individual account may look legitimate in isolation — it's only when you trace the connections that the pattern emerges.

ArcadeDB's graph engine lets you discover these rings by traversing shared identifiers: same device fingerprint, same phone number, overlapping addresses, shared IP ranges. The query on the right finds all accounts connected through any shared identifier within a specified depth, then calculates the total financial exposure of each cluster.

Because the traversal happens at the database level, not in application code, you get:

  • Real-time results: Ring detection during transaction authorization, not in overnight batch jobs
  • Full explainability: The graph path shows exactly which shared identifiers connect the accounts — critical for regulatory reporting
  • Depth flexibility: Adjust traversal depth to balance thoroughness vs. speed (2 hops for real-time checks, 5+ for investigations)
  • Dynamic schema: Add new identifier types (crypto wallets, device fingerprints, biometric hashes) without schema migrations

Detect Fraud Rings via Shared Identifiers

Find all accounts connected to a flagged account through shared devices, phones, or addresses — and calculate total financial exposure:

-- Find fraud ring from a flagged account
MATCH (flagged:Account {id: $acctId})
      -[:USES_DEVICE|HAS_PHONE|
        HAS_ADDRESS*1..4]-
      (connected:Account)
WHERE connected <> flagged
WITH DISTINCT connected,
  length(shortestPath(
    (flagged)-[:USES_DEVICE|HAS_PHONE|
     HAS_ADDRESS*]-(connected)
  )) AS distance

RETURN connected.id,
       connected.name,
       connected.credit_limit,
       connected.balance,
       distance
ORDER BY distance ASC

Closer connections (fewer hops) indicate stronger risk. The investigation team can follow the exact path to understand how each account is linked.

Fuzzy Entity Resolution

Find accounts that may be the same person using full-text fuzzy matching on names and addresses, then link them in the graph:

-- Find potential synthetic identities:
-- same SSN but different name variations
SELECT a.id, a.full_name, a.ssn,
       b.id, b.full_name, b.ssn,
       a.full_name.similarity(b.full_name)
         AS name_sim,
       a.address.similarity(b.address)
         AS addr_sim
FROM Account AS a, Account AS b
WHERE a.ssn = b.ssn
  AND a.id < b.id
  AND a.full_name.similarity(b.full_name)
       BETWEEN 0.4 AND 0.9

-- Exact match (1.0) = likely same person
-- Low match (0.4-0.9) with same SSN
--   = potential synthetic identity

Synthetic fraudsters use real SSNs with slightly altered names. Full-text similarity catches "Jon Smith" vs. "Jonathan Smithe" — variations designed to evade exact-match checks.

Synthetic Identity Fraud: Full-Text Search Meets Graph

Synthetic identity fraud is the fastest-growing type of financial crime. Fraudsters combine a real Social Security number (often stolen from a child, elderly person, or deceased individual) with a fabricated name and address to create a new identity that passes basic verification checks.

Catching synthetic identities requires entity resolution — the ability to determine that two records that look different may actually represent the same person or a deliberate variation. This is a text-matching problem that graph databases alone cannot solve.

ArcadeDB's built-in full-text search engine with fuzzy matching handles this natively:

  • Fuzzy name matching: Detect "Jonathan Smith" vs. "Jon Smithe" vs. "J. Smith" as likely the same person
  • Address normalization: Match "123 Main St Apt 4B" with "123 Main Street, #4B"
  • Cross-reference with graph: Once entities are resolved, link them in the graph and traverse to discover the broader fraud network
  • No external tooling: Neo4j requires external entity resolution services. AWS Neptune needs AWS Entity Resolution. ArcadeDB does it in-database.

The result: a workflow where suspicious matches are identified via full-text search, linked as relationships in the graph, and then traversed to uncover the full scope of the fraud operation — all within a single database.

Anti-Money Laundering: Follow the Money

Money laundering depends on obscuring the trail between the origin and destination of funds. The classic techniques — layering (moving money through multiple accounts), smurfing (structuring deposits below reporting thresholds), and circular transfers — all create patterns that are invisible in a relational database but obvious in a graph.

ArcadeDB's time-series engine adds a critical dimension that pure graph databases lack: temporal pattern analysis. Money laundering schemes have distinct time signatures — rapid sequences of transfers just below reporting limits, dormant accounts that suddenly activate, or precisely timed circular flows.

  • Circular flow detection: Graph cycle queries find money returning to its origin through chains of intermediaries
  • Structuring detection: Time-series bucketing reveals deposits deliberately split to stay below $10,000 thresholds
  • Velocity analysis: Time-series rate() functions detect sudden surges in transaction frequency
  • Dormant account activation: Time-series gap detection identifies accounts with no activity for months that suddenly process high volumes
  • Cross-border patterns: Graph traversal combined with document properties traces funds flowing through high-risk jurisdictions

Detect Circular Money Flows

Find circular transfer chains where money returns to its origin through intermediaries — a hallmark of money laundering:

-- Detect circular money flows
-- (A sends to B, B to C, ... back to A)
MATCH (origin:Account)
      -[:TRANSFERRED_TO*3..6]->
      (origin)
WHERE all(t IN relationships(path)
  WHERE t.timestamp >
    datetime() - duration('P30D'))
RETURN origin.id,
  [n IN nodes(path) | n.id] AS chain,
  reduce(s = 0,
    t IN relationships(path) |
    s + t.amount) AS total_flow

Detect Structuring (Smurfing)

Use time-series bucketing to find accounts making multiple deposits just below the $10,000 reporting threshold:

-- Structuring detection:
-- multiple deposits $8K-$9.9K in 24h
SELECT time_bucket('1d', ts) AS day,
       account_id,
       count(*) AS deposit_count,
       sum(amount) AS total
FROM Deposits
WHERE amount BETWEEN 8000 AND 9999
  AND ts > now() - INTERVAL '30d'
GROUP BY day, account_id
HAVING deposit_count >= 3
ORDER BY total DESC

Behavioral Anomaly Detection

Compare a new transaction's behavioral embedding against the customer's established profile to flag anomalies in real-time:

-- Flag transactions that deviate
-- from customer's normal behavior
SELECT t.id, t.amount, t.merchant,
  vectorDistance(
    t.behavior_embedding,
    c.profile_embedding
  ) AS deviation
FROM Transaction t
JOIN Customer c
  ON t.customer_id = c.id
WHERE t.timestamp > now() - INTERVAL '1h'
  AND vectorDistance(
    t.behavior_embedding,
    c.profile_embedding
  ) > 0.7
ORDER BY deviation DESC

Behavioral embeddings encode transaction amount, time of day, merchant category, location, and device — capturing patterns that no static rule can express.

Vector Embeddings: Beyond Rules-Based Detection

Rules-based fraud detection ("flag transactions over $10,000") catches unsophisticated fraud but generates massive numbers of false positives and misses novel patterns entirely. Sophisticated fraudsters know the rules and operate just below them.

Behavioral embeddings offer a fundamentally different approach. By encoding a customer's transaction history — amounts, timing, merchant types, locations, device fingerprints — as a high-dimensional vector, you create a behavioral fingerprint that represents what "normal" looks like for each individual customer.

New transactions are encoded the same way and compared against the customer's profile vector. A large distance means the transaction deviates significantly from established patterns — even if it follows every rule perfectly.

  • No predefined rules: The system learns what's normal for each customer individually
  • Novel fraud detection: Catches patterns the fraud team hasn't seen before
  • Fewer false positives: A $5,000 transaction is anomalous for some customers, routine for others
  • Continuous learning: Update profile embeddings as customer behavior evolves

ArcadeDB stores these embeddings directly on customer vertices using JVector, enabling sub-millisecond distance calculations as part of graph traversal queries. No external vector database required.

Time Series: Catching What Static Analysis Misses

Fraud has a temporal dimension that pure graph analysis misses. A graph can tell you that two accounts share a device. But it can't tell you that both accounts were created within 30 seconds of each other at 3am, followed by a burst of 47 micro-transactions in 5 minutes — a pattern screaming "velocity attack."

ArcadeDB's native time-series engine stores transactional events with nanosecond precision and provides built-in analytical functions designed for exactly these patterns:

  • Velocity detection: The rate() function computes transaction frequency per second — a spike from 2/hour to 50/minute triggers an alert
  • Time bucketing: Aggregate transactions by hour, day, or week to reveal patterns invisible at the individual transaction level
  • Moving averages: Detect gradual behavior changes (a slowly increasing transaction ceiling) that would fool static thresholds
  • Gap detection: Identify dormant accounts that suddenly activate — a common money mule indicator
  • Correlation analysis: The correlate() function detects whether two accounts' transaction patterns move in lockstep, suggesting coordination

These functions run at the database level, not in application code, making them available for both real-time authorization checks and batch investigation queries.

Velocity Attack Detection

Detect accounts with sudden spikes in transaction frequency using time-series rate analysis:

-- Detect velocity attacks:
-- accounts with abnormal tx frequency
SELECT account_id,
  rate(tx_count) AS current_tps,
  count(*) AS tx_count,
  sum(amount) AS total_amount
FROM Transactions
WHERE ts > now() - INTERVAL '5m'
GROUP BY account_id
HAVING current_tps >
  10 * (SELECT rate(tx_count)
        FROM Transactions
        WHERE ts > now() - INTERVAL '30d'
          AND account_id =
              Transactions.account_id)
ORDER BY current_tps DESC

Flags accounts whose last-5-minute transaction rate is 10x their 30-day average — catching card testing attacks, account takeovers, and automated fraud bots.

Correlated Account Activity

-- Find accounts whose transaction
-- patterns are suspiciously correlated
SELECT correlate(a.amount, b.amount)
  AS correlation
FROM Transactions a, Transactions b
WHERE a.account_id = $acctA
  AND b.account_id = $acctB
  AND a.ts > now() - INTERVAL '7d'

A Pearson correlation > 0.9 between two accounts' transaction amounts over time strongly suggests coordinated activity.

Complete Fraud Investigation Query

Combine graph traversal, time-series analysis, and vector anomaly detection in a single investigation:

-- Multi-model fraud investigation:
-- 1. Graph: find connected accounts
-- 2. Time series: check for velocity
-- 3. Vectors: check for anomaly

MATCH (flagged:Account {id: $acctId})
  -[:USES_DEVICE|HAS_PHONE|HAS_EMAIL
    *1..3]-(linked:Account)
WITH DISTINCT linked

RETURN linked.id, linked.name,

  -- Time-series: recent tx velocity
  ts.rate(linked, 'Transactions',
    'tx_count',
    now() - duration('PT1H'),
    now()) AS tx_velocity,

  -- Vector: behavioral deviation
  vectorDistance(
    linked.recent_behavior,
    linked.baseline_behavior
  ) AS anomaly_score,

  -- Composite risk score
  (0.4 * tx_velocity_normalized
   + 0.35 * anomaly_score
   + 0.25 * (1.0 / distance))
    AS risk_score

ORDER BY risk_score DESC

Putting It All Together: Multi-Model Investigation

Real fraud investigations don't use one technique — they combine every available signal. When an account is flagged, the investigation team needs to:

  1. Map the network (graph): Find all accounts connected through shared identifiers
  2. Check velocity (time series): Are any linked accounts showing abnormal transaction patterns?
  3. Detect anomalies (vectors): Are any linked accounts behaving differently from their established profiles?
  4. Resolve identities (full-text): Are any linked accounts using name/address variations suggesting synthetic identities?
  5. Score and rank (composite): Produce a unified risk score blending all signals

In a typical architecture, this requires orchestrating queries across 4-5 separate databases, merging results in application code, and accepting stale data from sync delays. With ArcadeDB, it's a single query that returns a composite risk score in real-time.

The query on the left demonstrates this: it starts with a graph traversal to find linked accounts, then enriches each account with time-series velocity data, vector anomaly scores, and produces a weighted risk score — all in one round-trip to the database.

Fraud Types ArcadeDB Detects

Transaction Fraud Rings

Groups of accounts collaborating on fraudulent transactions, connected through shared devices, phone numbers, or IP addresses. Graph traversal reveals the ring structure; time-series detects coordinated timing.

Synthetic Identity Fraud

Fabricated identities built from real and fake data. Full-text fuzzy matching detects name/address variations; graph analysis links synthetic identities that share underlying real data elements.

Money Laundering (AML)

Complex layering schemes using circular transfers, structuring, and shell accounts. Graph cycle detection finds circular flows; time-series bucketing catches structuring patterns.

Account Takeover

Compromised credentials used to access legitimate accounts. Vector embeddings detect behavioral deviations (different location, device, transaction patterns) from the account owner's baseline.

Insurance Claims Fraud

Staged accidents, duplicate claims, and provider collusion. Graph analysis reveals overlapping participants across claims; document storage holds unstructured evidence (photos, reports, notes).

First-Party Fraud

Individuals or groups deliberately defaulting on credit. Graph community detection identifies clusters of accounts with shared characteristics and simultaneous defaults.

Which Model Catches What

Fraud Type Graph Vector Time Series Full-Text
Fraud Rings
Synthetic Identity
Money Laundering
Account Takeover
Insurance Fraud
First-Party Fraud

Every fraud type benefits from multiple data models. Only a multi-model database can combine all four in a single query without data movement between systems.

Platform Comparison

Capability Typical Stack ArcadeDB
Graph traversal Neo4j Built-in
Vector anomaly detection Pinecone / Weaviate Built-in
Time-series analytics TimescaleDB / InfluxDB Built-in
Full-text entity resolution Elasticsearch Built-in
Document storage MongoDB Built-in
Data synchronization Kafka / CDC Not needed
License Mixed / proprietary Apache 2.0

Why ArcadeDB for Fraud Detection

Building a production fraud detection platform typically requires assembling a constellation of specialized databases: a graph database for relationship analysis, a vector store for anomaly detection, a time-series database for transaction patterns, a search engine for entity resolution, and a document store for case data. Add data synchronization pipelines and you're managing 6+ systems.

ArcadeDB replaces all of this with a single, unified engine:

  • Zero data movement: No ETL pipelines, no sync delays, no consistency gaps between systems
  • Real-time scoring: All five models queried in a single round-trip, fast enough for inline transaction authorization
  • Complete audit trail: Every data point, every relationship, every time-series event in one place for regulatory compliance
  • Simpler operations: One database to secure, backup, monitor, and scale
  • Lower cost: One system instead of six, with no commercial license fees

Apache 2.0 — Forever

Fraud detection systems are critical infrastructure. You need to trust that the database beneath them won't change its licensing terms, restrict your use case, or demand commercial fees under threat of non-compliance. ArcadeDB is Apache 2.0 forever — no bait-and-switch, no source-available restrictions, no per-node pricing surprises. Deploy it on-premise in an air-gapped environment, embed it in a commercial product, or run it in your own cloud — the license will never be used against you.

Client Success Story

"We migrated from a traditional RDBMS to ArcadeDB for our fraud detection platform. The results were immediate — we reduced false positives by 68% while catching 35% more actual fraud cases. The ability to query 4-5 hops in the relationship graph in under 50ms was game-changing for our real-time detection engine."

— Senior Engineering Manager, Global Payment Processing Company
(Company name withheld due to NDA)

Results Achieved:

  • 68% reduction in false positive alerts
  • 35% increase in fraud detection accuracy
  • Sub-50ms query response times for 5-hop traversals
  • $2.4M saved annually in fraud losses

Industries

  • Banking & Payments: Transaction fraud rings, card testing attacks, unauthorized access
  • Insurance: Claims fraud, provider collusion, staged incidents
  • E-commerce: Promo abuse, return fraud, fake reviews, account takeover
  • Telecommunications: Subscription fraud, SIM swap attacks, toll fraud
  • Government: Tax fraud, benefits fraud, identity theft detection
  • Cryptocurrency: Wallet clustering, mixer detection, wash trading

Ready to Build Your Fraud Detection System?

Start with ArcadeDB today. Combine graph traversal, vector anomaly detection, time-series patterns, and full-text entity resolution in a single Apache 2.0 database — no multi-system orchestration required.