Identifying Money Laundering accounts

Solution Background and Business Value

Law enforcement and financial institutions aim to detect and prevent money laundering to mitigate financial harm and stop criminal activities. Ideally, this practice should be stopped before the money leaves the account, as financial institutions can be liable if it is not intercepted in time. The problem is exacerbated in the cryptocurrency world, where accounts are easy to create and single transactions can involve payments from and to many accounts.

Graph Neural Networks (GNNs) can help detect network patterns in raw transaction data that are much more challenging to identify using traditional fraud detection methods.

In this solution document you will learn what data is required to create a money laundering transaction classifier in Kumo, how to build the classifier, and how to make use of it’s predictions in real-world fraud detection systems.

Data Requirements and Kumo Graph

We can begin developing our model with a small set of core tables. Kumo allows us to extend the model by incorporating more sources of signal with additional tables.

Core Tables

Accounts Table: This table holds information about all accounts. It should include:
- account_id: A unique identifier for each account.
- Other attributes such as location, phone number, creation timestamp, type, etc.
Transactions Table: This table records all transactions on any account, including deposits and withdrawals with amounts:
- transaction_id: A unique identifier for each transaction.
- TIMESTAMP: The time of the transaction.
- Other properties of the transaction, such as amount.
Inputs Table: This table records all input accounts to a transaction:
- TIMESTAMP: The time of the transaction.
- transaction_id: To link the record to a specific transaction.
- account_id: To link the record to a specific account.
- Other properties.
Outputs Table: This table records all output accounts to a transaction:
- TIMESTAMP: The time of the transaction.
- transaction_id: To link the record to a specific transaction.
- account_id: To link the record to a specific account.
- Other properties.
Reports Table: This table records all accounts that have been reported for money laundering:
- TIMESTAMP: The time of the report.
- account_id: To link the report to specific accounts.
- Other properties of the report.

Optional Tables

Additional Account Events, Reports, Links: Other relevant events and reports related to accounts.
And many more possibilities.

Predictive Query

To effectively stop money laundering, we need to intercept it once the money enters the account but before it leaves. Therefore what we really want to predict is if a fraud report will be associated with a particular user in the future.

Money Laundering Prediction

This model predicts the probability that an account will be reported for money laundering in the next N days for accounts that recently received a deposit.

PREDICT COUNT(reports.*, 0, N, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

We can also easily produce models which apply to different future time-ranges, so that different modalities of fraud are modelled by different models. The final predictions can then be an aggregation over individual predictions.

// Report will be generated in next 10 days
PREDICT COUNT(reports.*, 0, 10, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

// Report will be generated in 10 - 30 days
PREDICT COUNT(reports.*, 10, 30, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

// Report will be generated in 30 - 90 days
PREDICT COUNT(reports.*, 30, 90, days ) > 0
FOR EACH accounts.account_id
WHERE COUNT(inputs.*, -1, 0, days) > 0

// Static user lebel prediction (need to have LABEL column in accounts table)
PREDICT accounts.LABEL == 1
FOR EACH accounts.account_id

Deployment

In production, we have two options to leverage this model:

Batch Predictions: Make account predictions in a batch fashion as often as possible to try to catch the money before it leaves the account. This method involves periodically scoring accounts based on the latest data available. Additionally, Graph embeddings can separately power and enhance ongoing fraud analysis workflows.
Real-Time Embeddings: If batch prediction isn’t fast enough, output the account embeddings learned by the model and use these downstream as features for real-time models. This allows for quicker responses to recent transactions and more immediate intervention.

Both approaches aim to enhance the timeliness and accuracy of money laundering detection, ultimately helping to prevent financial crimes more effectively.