Drafting NDAs for Sharing Personal Data with AI Models

Protect personal data when sharing with AI: practical NDA terms, controls, and operational steps to reduce privacy risk and stay compliant—use this checklist now.

Organizations increasingly must share personal data with vendors building or using AI models. A focused NDA tailored to AI risks reduces legal exposure, preserves trust, and enables safe collaboration while meeting regulatory requirements.

TL;DR: Use a narrow, purpose-limited NDA with explicit definitions, strict training and redisclosure limits, encryption and logging, retention schedules, and audit rights.
Attach a data map and allowed-use matrix; control third-party/API access and require breach notification.
Operationalize with exhibits, regular tests, monitoring, and review with legal/privacy counsel for GDPR/CCPA alignment.

Define use case, parties, and data flow

Start by describing the business purpose, scope, and duration of data sharing in plain language. Identify the disclosing party (data controller or steward), the receiving party (processor/vendor), and any subprocessors or affiliates.

Use case example: “Vendor will analyze customer transaction logs to surface fraud patterns for product X from Jan–Dec 2026.”
Data flow diagram: embed a simple visual in documents (e.g., ASCII or linked diagram) showing sources, preprocessing, model training, inference, storage, and deletion points.
Include a contact list for security, privacy, and legal points of contact.

Source -> Ingest -> Pseudonymize -> Train/Score -> Store (encrypted) -> Delete/Archive

Quick answer — Use a narrow, purpose-limited NDA that explicitly defines what counts as personal data and model outputs, prohibits or tightly conditions model training and redisclosure, and sets retention/deletion timelines; require encryption, role-based access, logging/audit trails, breach notification, and clear remedies. Attach a data map and allowed-use matrix, control third-party/API access, and review with legal/privacy counsel to ensure compliance (gdpr/ccpa) before sharing.

Quick answer for featured snippet: Use a narrowly scoped NDA that defines personal data and derived outputs, forbids model training or requires explicit, auditable authorization for it, mandates encryption, role-based access, logging, breach notification, and retention/deletion rules, and attaches a data map and allowed-use matrix reviewed for GDPR/CCPA compliance.

Identify and classify personal data and derived outputs

Enumerate all data elements to be shared and classify them by sensitivity and legal status. Include derived outputs and embeddings in the scope.

Classification tiers: Public, Internal, Confidential, Sensitive Personal Data (e.g., SSNs, health), Highly Sensitive (e.g., minors, biometric identifiers).
Derived outputs to include: model weights, embeddings, synthetic data, inference logs, attention maps, and feature importances.

Example data classification
Data element	Class	Retention guidance
Email address	Confidential	Delete within 90 days after project end
Transaction amounts	Internal	Aggregate and purge raw records after 180 days
Health condition	Highly Sensitive	Prohibit sharing unless explicit consent; store only encrypted

Explicitly state that embeddings or model outputs that can be reversed or linked to an individual are treated as personal data for NDA purposes.

Narrow permitted uses and forbid or condition model training

Limit use to the stated purpose and timeframe. Model training is the riskiest operation — forbid it by default or permit only under strict conditions.

Permitted uses: inference for a named product feature, analytics for a defined report, or algorithm tuning for specified metrics.
Training conditions (if allowed): explicit written authorization, delineated datasets, differential privacy/noising requirements, no retention of raw sensitive inputs in training artefacts, and independent verification.
Prohibit transfer into general-purpose model repositories or use as pretraining corpora unless contractually controlled and auditable.

Include consequences for noncompliance: injunctive relief, financial remedies, obligation to remediate, and return/destruction of copies.

Set access controls, retention, deletion, and data minimization rules

Design rules that minimize exposure, limit retention, and make deletion verifiable.

Access controls: least privilege, role-based access (RBAC), multifactor authentication (MFA), and documented approvals for privileged accounts.
Retention: specify retention period per data class and events that trigger deletion (project end, contract termination, revocation of consent).
Deletion: require cryptographic erasure where possible, and provide attestations with hash manifests and timestamps.
Minimization: share only fields needed for the purpose; apply pseudonymization or hashing before transfer.

Example clause: “Provider will retain personal data only for 90 days post-project and will provide a signed deletion certificate within 14 days of request.”

Require security measures, logging, audits, and breach response

Spell out specific security controls, monitoring, and incident processes rather than generic promises.

Encryption: at-rest and in-transit using modern ciphers (e.g., AES-256, TLS 1.2+).
Key management: customer-managed keys where feasible, rotation policies, and separation of duties.
Logging and monitoring: immutable logs for access and data exports, 90–365 day retention of audit logs, and periodic review.
Audits: right to audit (on-site or remote), SOC 2/ISO 27001 evidence, and third-party penetration test reports.
Breach response: explicit notification timelines (e.g., within 72 hours), required content of notices, and remediation responsibilities.

Security & audit quick reference
Control	Suggested contractual requirement
Encryption	AES-256 at rest; TLS 1.2+ in transit
Audit logs	Immutable logs, retained 180 days
Pen tests	Annual external pentests; remediation within 60 days

Manage third parties, APIs, and sublicensing

Control downstream risk by requiring the receiving party to treat subprocessors as contract-bound extensions of themselves.

List approved subprocessors in an exhibit; require notice and approval for additions or an objection window.
Restrict API-based access: enforce IP allowlisting, rate limits, and output filtering; prohibit exposing raw personal data through public or shared endpoints.
Sublicensing: prohibit absent explicit written consent; require flow-down of all core data protections to any subcontractor.
Require contractual indemnities for subcontractor breaches and prompt termination rights for non-compliant subprocessors.

Common pitfalls and how to avoid them

Vague definitions — Remedy: define “personal data,” “derived outputs,” and “training” precisely in the NDA.
Allowing broad training rights — Remedy: forbid training by default or require granular approvals and technical safeguards (DP, no retention).
No attestations of deletion — Remedy: require signed deletion certificates and cryptographic erasure where possible.
No audit rights — Remedy: include remote/on-site audit rights and mandatory evidence (SOC reports, pentest summaries).
Ignoring APIs/subprocessors — Remedy: include exhibits listing approved subprocessors and API access rules; require flow-down clauses.
Insufficient breach timelines — Remedy: specify short notification windows, remediation steps, and public communication obligations.

Operationalize: attach exhibits, test, monitor, and update regularly

Turn contract language into operational artifacts and governance routines.

Exhibits to attach: data map, allowed-use matrix, list of authorized personnel, retained subprocessors, retention schedule, sample deletion certificate, and security checklist.
Testing: run dry-runs for ingest, pseudonymization, and deletion; validate with hash manifests and end-to-end logging.
Monitoring: automate alerts on anomalous access, regular access reviews, and scheduled compliance reviews (quarterly).
Updates: review NDA and exhibits annually or when regulations/technology change; include a change-control process.

Example exhibit entry: “Data Map: Field ‘customer_phone’ — Source: CRM; Classification: Confidential; Permitted use: inference only; Retention: delete 30 days after project.”

Implementation checklist

Define use case, parties, and data flow diagram.
Prepare data map and classify data elements/derived outputs.
Draft narrow NDA clauses: definitions, permitted uses, training limits, and remedies.
Specify access controls, encryption, retention, deletion attestations.
Attach exhibits: subprocessors, allowed-use matrix, deletion certificate template.
Require audits, logging, breach timelines, and indemnities.
Run tests, monitor access, and schedule reviews with legal/privacy counsel.

FAQ

Q: Should embeddings be treated as personal data?

A: Yes if they can be linked back to individuals or enable re-identification; treat as personal data in the NDA to be safe.
Q: Can I allow a vendor to train models on pseudonymized data?

A: Only with strict controls: documented pseudonymization, prohibition on re-identification, DP/noising, limited retention, and contractual audits.
Q: What audit evidence should I request?

A: SOC 2/ISO reports, recent pentest summaries, audit logs, deletion certificates, and penetration remediation reports.
Q: How do NDAs interact with GDPR/CCPA?

A: NDAs are contractual safeguards; ensure they reflect controller/processor roles, legal bases for processing, DSAR handling, and cross-border transfer mechanisms.
Q: What if a vendor refuses deletion attestations?

A: Escalate to contractual remedies: require escrow, escrowed encryption keys, or refuse the engagement unless attestation is provided.