AI Supply Chain Risk: Understanding AI Provenance and Integrity Australia

When you deploy a machine learning model in production, you’re not just running code—you’re deploying a complex artefact built from training data, frameworks, dependencies, APIs, and inference engines, each of which represents a potential vulnerability. The AI supply chain extends from the moment raw data is collected through model training, versioning, packaging, and delivery to your production systems. A single point of failure anywhere in that chain—poisoned training data, compromised open-source libraries, insecure APIs, or malicious model weights—can undermine the integrity of your entire AI system. What does your organisation know about the provenance of the AI models you depend on?

In March 2026, malicious versions of the popular LiteLLM package were published with embedded malware designed to harvest environment variables, API keys, SSH credentials, and Kubernetes configurations. The compromise went undetected for days, affecting organisations worldwide. In Australia, where many enterprises rely on open-source frameworks and third-party models, the risk is acute.

What AI Supply Chain Risk Means

The AI supply chain is the complete pathway from raw training data to a live, production AI system. It includes: (1) training data collection and preprocessing; (2) model development and fine-tuning; (3) model packaging and versioning; (4) API layers and inference platforms; (5) integration with your application; (6) ongoing monitoring and updates. Each stage depends on the integrity of the one before it.

A threat at any stage cascades forward. Poisoned training data produces a compromised model. A vulnerable dependency introduces malware into your training pipeline. An insecure API allows attackers to extract model weights or modify inference outputs. A malicious model update deployed by your vendor changes behaviour silently. Unlike traditional software supply chains, which focus on binary integrity, AI supply chains must also manage data provenance, model behaviour, and ongoing model drift.

Key Risks at Each Point

Compromised Training Data

Attackers or malicious insiders poison training datasets by injecting misleading or malicious examples. A financial services firm training a fraud detection model on compromised data may learn to classify actual fraud as legitimate. A healthcare AI trained on corrupted patient data produces unsafe clinical recommendations. Data poisoning can be targeted (poison specific examples to make the model fail for a competitor) or broad (degrade overall model performance). The challenge: poisoned data often produces models that appear to work correctly on validation sets but fail in production.

Vulnerable Dependencies & Open-Source Risks

AI models depend on libraries, frameworks, and tools (PyTorch, TensorFlow, scikit-learn, transformers) and auxiliary packages. A vulnerability in any dependency can expose your system. The LiteLLM incident demonstrated how even popular packages trusted by thousands can be compromised. Australian organisations using open-source AI frameworks must monitor dependency vulnerabilities continuously, yet many lack visibility into what versions of which libraries are in production.

Insecure APIs & Inference Platforms

When you expose a model via API, you create an attack surface. A poorly secured API allows attackers to query your model extensively, potentially extracting training data or discovering adversarial examples that cause misbehaviour. Organisations using third-party inference platforms (Azure ML, AWS SageMaker, Hugging Face) depend on those vendors’ security practices. Data in transit to and from the API must be encrypted, and API authentication must prevent unauthorised access.

Model Watermarking & IP Circumvention

Organisations embed watermarks in model weights to prove ownership and detect unauthorised copying. Attackers attempt to circumvent these watermarks through fine-tuning, quantisation, or distillation. A compromised model distributed by a malicious actor may strip watermarking, allowing unauthorised use or derivative models without the original developer’s knowledge.

Serialisation Attacks & Model Trojans

Models are typically serialised to files (e.g., .pkl, .safetensors, .onnx formats) for storage and transport. An attacker can inject malicious code into a serialised model file. When your system deserialises the file, the embedded code executes. Models can also contain hidden backdoors—layers or logic that activate only under specific conditions, allowing an attacker to trigger misbehaviour on demand.

ACSC Guidance on AI Supply Chain Security

The Australian Signals Directorate (ASD) and the National Security Agency (NSA) jointly published guidance in 2025 and March 2026 on securing AI supply chains. The framework applies six critical controls:

1. Data Integrity & Provenance: Organisations should quarantine and test externally sourced data before moving it into internal systems. Review and preprocess data to identify anomalies or poisoning indicators. Use integrity methods such as checksums, hashes, digital signatures, and lineage tracking to verify that data hasn’t been tampered with. Document the source, collection date, and processing history of all training data.

2. Secure Model Sources: Prefer models from trusted, transparent sources. Where possible, use models developed in-house where you can control data provenance. For third-party models, verify the vendor’s security practices and contractual commitments. Avoid downloading models from unverified repositories or peer-to-peer networks.

3. Model Integrity & Verification: Apply checksums, hashes, and digital signatures to model files to detect tampering. Before deploying a model, perform initial and periodic performance testing to catch drift or unexplained behaviour changes. Maintain a registry of verified model versions and track which models are in production. Prefer secure file formats (e.g., .safetensors over pickle) that reduce deserialisation attack risk.

4. Secure Development & Training Pipelines: Use secure, isolated environments for training. Restrict access to training data and model development to authorised personnel only. Implement version control for models, hyperparameters, and training data references. Log all changes and maintain audit trails.

5. Dependency & Component Security: Maintain a software bill of materials (SBOM) for all dependencies, frameworks, and tools used in your AI pipeline. Monitor for vulnerabilities in dependencies continuously using automated scanning. Apply patches promptly, and conduct security testing before deploying updated dependencies. For open-source components, review code and community trust signals before adoption.

6. Supply Chain Contracts & Accountability: When procuring models or AI services from vendors, establish contractual requirements for data provenance, model integrity, security testing, and incident notification. Require vendors to maintain secure development practices and to notify you immediately of any security incidents affecting your data or models. Include audit rights and data portability clauses.

Assessing Provenance of AI Models You Use

Before integrating any external AI model, conduct a provenance assessment:

Source Verification: Where did the model originate? Is the vendor or developer reputable and well-known? Does the model come from an official repository (e.g., Hugging Face’s official model hub) or a third-party source? Models from official sources with strong community oversight are lower risk than obscure sources.

Training Data Transparency: What data was the model trained on? Is the training dataset documented and publicly available (e.g., Common Crawl, Wikipedia, curated enterprise datasets)? Are there known biases or quality issues? Models trained on proprietary, undisclosed data carry higher uncertainty about behaviour.

Version & Update History: Has the model been updated recently? Are release notes available explaining what changed? Does the developer maintain a version registry? Abandoned models that haven’t been updated in years may carry unpatched vulnerabilities.

Security Certifications: Has the model been audited by a third party? Are there certifications or attestations of security practices? Does the vendor publish security advisories and incident reports transparently?

Legal & Licensing: What’s the licence? Are there restrictions on commercial use, redistribution, or derivative works? Does the vendor indemnify you against copyright claims if the model was trained on copyrighted data?

Performance Testing: Test the model’s behaviour on your own data before production deployment. Look for unexpected outputs, performance degradation, or signs of backdoors. Compare behaviour across multiple versions to detect drift.

Contractual Protections for AI Supply Chain

When procuring AI models, services, or training data from vendors, your contracts should include:

Data Provenance & Rights: Vendor certifies the source of all training data and warrants it doesn’t infringe third-party IP. Vendor indemnifies you against copyright or privacy claims arising from the model. Vendor discloses any known biases or limitations in training data.

Model Security & Integrity: Vendor warrants the model is free of malware, backdoors, and unauthorised code. Vendor provides checksums or digital signatures verifiable by you. Vendor agrees to notify you immediately (within 24–48 hours) of any suspected security compromise or data breach affecting the model.

Ongoing Maintenance & Support: Vendor commits to patching security vulnerabilities in model code and dependencies within a defined timeframe. Vendor provides release notes documenting all changes in model updates. Vendor maintains backwards compatibility or provides migration guidance when retiring models.

Audit & Compliance: Vendor grants you audit rights to verify security practices and data handling. For regulated industries (banking, healthcare), vendor complies with Australian Privacy Act, APRA CPS 234, and sector-specific requirements. Vendor commits to documenting and providing evidence of compliance.

Data Confidentiality & Portability: Vendor commits not to use your data to train other models or for purposes outside the contracted scope. Vendor allows you to export your data and models in a usable format if the relationship ends, with reasonable transition periods.

Incident Response & Liability: Vendor maintains cyber liability insurance and commits to incident response SLAs. Vendor limits liability appropriately, but indemnifies you for losses arising from vendor’s negligence or breach of security obligations.

FAQ

Q: Should we prefer open-source models over commercial models?
A: Open-source models offer transparency and community scrutiny, which is valuable. However, transparency doesn’t guarantee security—community-reviewed code can still contain vulnerabilities. Commercial models often come with vendor support, SLAs, and indemnification. The choice depends on your risk tolerance, technical capability, and regulatory environment. A balanced approach: use open-source for non-critical workflows, and commercial or in-house models for regulated or sensitive use cases.

Q: What’s the difference between model versioning and model provenance?
A: Versioning tracks which model version is in production. Provenance documents how and from what data a specific version was created—the lineage of data sources, training parameters, and approval steps. You can have two versions of a model (v1.2 and v1.3), but they might have different provenance if v1.3 was trained on a different dataset or by a different team. Strong provenance tracking gives you confidence in the model’s integrity.

Q: How do we monitor our AI supply chain after deployment?
A: Establish a model monitoring programme that tracks: (1) model performance metrics on production data; (2) data drift (does the production data distribution match training data?); (3) dependency vulnerability scans (monthly or continuous); (4) model version tracking and approval logs; (5) vendor security advisories and incident reports. Implement automated alerts for anomalies. Review model behaviour quarterly or after any vendor update.

Securing Your AI Supply Chain

AI supply chain security is foundational to responsible AI governance. By understanding the provenance of your models, managing dependencies, securing APIs, and establishing contractual protections with vendors, you reduce the risk of compromise and gain confidence that your AI systems behave as intended. Australian organisations operating under Privacy Act and sector-specific regulations (APRA, ASIC) must demonstrate these controls as part of their risk management obligation.

Anitech helps Australian enterprises assess AI supply chain risk, design secure procurement practices, and implement ongoing monitoring to ensure the models and data you depend on maintain integrity and performance.

Contact us to review your AI supply chain risk profile and strengthen your model governance.

AI Supply Chain Risk: Understanding AI Provenance and Integrity Australia

AI Supply Chain Risk: Understanding AI Provenance and Integrity Australia

What AI Supply Chain Risk Means

Key Risks at Each Point

Compromised Training Data

Vulnerable Dependencies & Open-Source Risks

Insecure APIs & Inference Platforms

Model Watermarking & IP Circumvention

Serialisation Attacks & Model Trojans

ACSC Guidance on AI Supply Chain Security

Assessing Provenance of AI Models You Use

Contractual Protections for AI Supply Chain

FAQ

Securing Your AI Supply Chain

Leave a Comment