Anomaly Detection with Machine Learning | Anitech AI

Introduction

Most business problems involve patterns: “What’s the normal customer behaviour?” “What does a healthy system look like?” “How much demand do we expect?”

But some of the costliest problems are exceptions: fraud in payment systems, equipment faults before catastrophic failure, cyberattacks on networks, data quality issues.

Anomaly detection uses machine learning to identify unusual patterns that deviate from “normal” behaviour. Unlike fraud rules (which trigger on specific conditions like “transaction > AUD 10,000”), ML-based anomaly detection learns from data what “normal” looks like, then flags anything significantly different.

For Australian organisations, anomaly detection delivers measurable impact:

Financial services:
– Payment fraud detection catches 98%+ of fraudulent transactions before approval
– Reduces fraud losses 40–60%
– Improves customer experience (fewer false declines)

Operations & Manufacturing:
– Equipment anomalies detected 7–30 days before failure
– Prevents catastrophic downtime
– Saves AUD 500K–2M per prevented failure

Network & Security:
– DDoS attacks and intrusions detected in milliseconds
– Reduces damage from security incidents 70–90%

What Is Anomaly Detection?

Anomaly detection identifies data points, patterns, or events that deviate significantly from expected behaviour.

Key Concepts

Normal behaviour: The baseline. What typical transactions look like, how healthy systems perform, what expected demand is.

Anomaly: A deviation from normal that’s statistically unlikely and potentially problematic. A transaction from an unusual location, a spike in network traffic, a sudden drop in production output.

Outlier vs. Anomaly:
– Outlier: A data point statistically different (e.g., customer who spends AUD 100K annually vs. average AUD 10K). Not necessarily a problem.
– Anomaly: An outlier that’s interesting or problematic. A customer’s account suddenly accessing services from 5 different countries within an hour is an anomaly (potential account compromise), not just an outlier.

Detection Approaches

Rule-based (Traditional):
“Alert if transaction > AUD 10,000 AND velocity > 3 transactions/hour AND new merchant”

Limitations: Rules are rigid and require domain expertise to maintain. Adversaries learn rules and evade them.

ML-based (Modern):
Model learns normal patterns from historical data. Anything sufficiently different from normal triggers an alert.

Advantages:
– Adapts to changing patterns automatically
– Captures complex, nonlinear relationships rules can’t capture
– Harder for adversaries to evade (not rule-based)
– Improved detection + lower false positives

Anomaly Detection Algorithms

Different algorithms excel in different contexts:

Isolation Forests

How it works: Recursively partitions feature space; anomalies are isolated quickly (require fewer splits)

Best for: High-dimensional data; fast scoring (milliseconds required); minimal labelled data

Accuracy: 85–92% detection rate; 3–8% false positive rate

Tools: scikit-learn, H2O

Common use cases: Payment fraud, credit card transactions

Autoencoders (Neural Networks)

How it works: Neural network trained to reconstruct normal data. Anomalies have high reconstruction error.

Best for: Complex, high-dimensional data; images or sequences

Accuracy: 88–96% detection rate; 2–6% false positive rate

Tools: TensorFlow, PyTorch, Keras

Common use cases: Network traffic analysis, equipment sensor data, cybersecurity

Local Outlier Factor (LOF)

How it works: Compares density of each point to density of neighbours. Dense point among sparse neighbours = anomaly.

Best for: Datasets with varying density; contextual anomalies

Accuracy: 82–90% detection rate; 5–12% false positive rate

Tools: scikit-learn, H2O

Common use cases: Operational monitoring, multi-dimensional metrics

One-Class SVM (Support Vector Machine)

How it works: Learns boundary around normal data. Anything outside boundary = anomaly.

Best for: Small to medium datasets; well-defined normal region

Accuracy: 85–91% detection rate; 4–10% false positive rate

Tools: scikit-learn, libsvm

Common use cases: Network intrusion detection, manufacturing quality

Statistical Methods (ARIMA, Exponential Smoothing)

How it works: Models expected values for time-series data. Deviations from expected = anomalies.

Best for: Time-series data with seasonality; forecasting-based anomalies

Accuracy: 80–88% detection rate; 5–15% false positive rate

Tools: statsmodels, R forecast package

Common use cases: Network traffic spikes, data quality issues, operational metrics

Real-World Case Study: Australian Payment Processor

Company: Payment processor serving 50,000+ merchants; processes 100M+ transactions annually

Problem: Credit card fraud rate at 0.08% (relatively low, but 80,000 fraudulent transactions annually × AUD 150 average loss = AUD 12M annual cost)

Baseline: Rule-based fraud detection catches 75% of fraud at 8% false positive rate (many legitimate transactions declined, causing customer friction)

Implementation

Data: 18 months transaction data on 50M transactions; 60K confirmed fraudulent transactions

Features:
– Transaction amount, merchant category, geography
– Card velocity (transactions per hour)
– Customer location (home vs. transaction location)
– Device fingerprinting (is this the customer’s known device?)
– Merchant reputation (is this a high-fraud merchant?)
– Time-of-day patterns (does this match customer’s usual activity?)

Algorithm: Ensemble approach (Isolation Forest + Autoencoder)

Deployment: Real-time scoring (< 100ms latency required for payment processing)

Results

Metric	Before	After	Improvement
Fraud detection rate	75%	94%	+19pp
False positive rate	8.0%	2.2%	-73%
Fraud losses avoided	AUD 9M	AUD 14.1M	+57%
Customer friction (declines)	500K/month	100K/month	-80%

Annual financial impact:
– Additional fraud caught: AUD 5.1M (savings)
– Reduced false declines: AUD 750K (improved customer retention, fewer payment failures)
– Operational savings (fewer dispute investigations): AUD 200K
– Total: AUD 6.05M annually

Investment: AUD 400K (data engineering, model development, infrastructure)
Payback period: 2 months
Year-1 ROI: 1,500%+

Use Cases Across Industries

Financial Services & Payments

Fraud detection: Identify suspicious transactions (unusual amounts, locations, merchants)

Insider threats: Detect unusual access patterns (employee accessing customer data outside normal hours/locations)

Account takeover: Flag when account accessed from new device or location

Market manipulation: Identify suspicious trading patterns (pump-and-dump schemes, spoofing)

Network & Cybersecurity

DDoS detection: Identify sudden traffic spikes inconsistent with normal patterns

Intrusion detection: Flag unusual network flows, port scans, data exfiltration

Malware: Detect suspicious process behaviour, network connections

Data loss prevention: Flag unusual data access or transfers

Manufacturing & Operations

Equipment failure: Detect sensor readings deviating from normal (temperature, vibration, pressure)

Quality issues: Identify products deviating from specifications

Staffing anomalies: Flag unusual absence or overtime patterns

Supply chain: Identify disruptions or unusual orders

Healthcare

Fraud detection: Identify suspicious claims (unbilled services, phantom procedures)

Patient safety: Flag unusual vital signs or lab results

Adverse events: Detect safety incidents (falls, medication errors)

Data Quality & Analytics

Missing data: Flag records with unexpected nulls or gaps

Data drift: Identify when data distribution shifts (signs data collection changed)

Duplicate records: Identify likely duplicates

Outliers: Flag unusual values (typos, measurement errors, or genuinely unusual data)

Building an Anomaly Detection System

Step 1: Define Normal

What does “normal” look like for your use case?

For fraud: Normal customer transaction patterns (typical merchants, amounts, locations, times)

For equipment: Healthy equipment operating ranges (vibration, temperature, pressure)

For networks: Typical traffic volumes, connection types, data flows

For data quality: Expected data distributions, completeness, format

This requires domain expertise + data analysis.

Step 2: Collect Baseline Data

Gather 3–12 months of clean, representative data representing normal behaviour. If possible, include labelled examples of known anomalies.

Quality is critical: If baseline data includes anomalies, the model learns anomalies as “normal” and misses them in production.

Step 3: Feature Engineering

Compute meaningful features from raw data:

For transactions: Amount, merchant category, geography, velocity (transactions per hour), time-of-day, customer history

For equipment: Vibration spectral features, temperature trend, acoustic emission patterns

For network: Traffic volume, packet size distribution, port numbers, source/destination IPs

For data quality: Completeness, uniqueness, distribution statistics, pattern consistency

Step 4: Algorithm Selection & Training

Test multiple algorithms:
– Isolation Forest (fast, good baseline)
– Autoencoders (complex patterns)
– One-Class SVM (well-defined boundaries)
– Ensemble (combine multiple algorithms)

Train on baseline data. Validate on holdout set. Measure detection rate (true positives) and false positive rate.

Target: 90%+ detection with < 5% false positives (varies by use case)

Step 5: Threshold Tuning

Adjust detection threshold to balance sensitivity vs. false positives:
– High sensitivity (low threshold): Catch more anomalies but more false positives (requires more investigation)
– Low sensitivity (high threshold): Fewer false positives but some anomalies missed

Optimal threshold depends on cost of false positive vs. cost of missed anomaly:
– Fraud: High cost of missing fraud; tolerate 3–5% false positives (systems can auto-decline or challenge)
– Equipment: High cost of downtime; tolerate 5–10% false positives (investigations are cheap; downtime is expensive)
– Data quality: Low investigation cost; tolerate 10–15% false positives

Step 6: Production Deployment

Serve model in real-time (< 100ms latency for payment fraud; < 1s for batch scoring)
Integrate with alerting system
Route alerts to appropriate teams (fraud analysts, equipment engineers, data stewards)

Step 7: Continuous Monitoring

Monitor model performance:
– Detection rate (are we catching anomalies?)
– False positive rate (are we creating too much noise?)
– Alert distribution (are we getting the right mix of alert types?)

Retrain periodically (monthly, quarterly) as new data arrives and normal patterns evolve.

Privacy & Fairness Considerations

Privacy in Anomaly Detection

Anomaly detection often uses personal data (transaction history, behaviour patterns, location data). You must:
– Document consent basis (Privacy Act)
– Protect personal data (encryption, access controls)
– Enable transparency (explain to customers why they were flagged as anomaly)
– Respect privacy requests (deletion, opt-out)

Fairness Considerations

Anomaly detection models can have disparate impact. For example:
– A churn prediction model might flag customers in a specific demographic as more likely to churn (fairness issue)
– A fraud model might flag transactions by certain customer groups as anomalous at higher rates (discrimination)

Best practices:
– Audit model predictions across demographic groups
– Test for disparate impact
– If disparities exist, investigate root cause (data bias vs. genuine differences)
– Document fairness considerations

Implementation Timeline & Budget

Phase	Duration	Cost	Deliverable
Assessment & POC	4–8 weeks	AUD 50–100K	Proof-of-concept model, baseline accuracy metrics
Pilot	8–12 weeks	AUD 100–200K	Production-ready model, alerting, initial monitoring
Full deployment	2–4 weeks	AUD 50–100K	Integration with all systems, team training
Total	4–6 months	AUD 200–400K	Full anomaly detection system

ROI typically materialises within 6–12 months.

Getting Started

[ ] Identify top use case (fraud, equipment failure, network intrusion, data quality)
[ ] Estimate cost of current anomalies (fraud losses, downtime, investigation effort)
[ ] Quantify improvement target (detect 90% of anomalies? Reduce false positives 50%?)
[ ] Assess data availability (12+ months historical data required)
[ ] Define stakeholders (fraud analysts, engineers, IT security, data teams)
[ ] Budget AUD 200–400K for 4–6 month implementation

Connecting to the Broader ML Cluster

This article focuses on anomaly detection. For related concepts, explore:

Machine Learning for Business Australia — Foundational ML concepts
Predictive Maintenance with Machine Learning — Equipment failure detection (specific use case)
MLOps for Australian Enterprises — Deploying and monitoring anomaly detection systems

Conclusion

Anomaly detection is a high-impact ML application that protects revenue (fraud), prevents downtime (equipment), and improves security (network intrusions).

The technology is mature. Algorithms are well-understood. The main barrier is defining “normal” and collecting representative baseline data.

Call to Action

Ready to detect anomalies and protect your business? Anitech AI specialises in anomaly detection for financial services, operations, network security, and data quality. We’ll build models tailored to your specific use case.

Talk to Anitech AI today. Let’s discuss how anomaly detection can protect your organisation.

Contact Anitech AI