Can LLMs comply with GDPR's right to erasure?

Traditional deletion is not straightforward for LLMs because personal data absorbed during training becomes embedded in model weights rather than stored in a retrievable database. Current approaches include output filtering, fine-tuning to suppress specific outputs, and machine unlearning techniques, though none fully satisfy the Article 17 standard yet.

Is consent required to train AI models on personal data under GDPR?

Not necessarily. The lawful basis debate centers on whether legitimate interest under Article 6(1)(f) can justify training data processing. Some EU regulators have accepted legitimate interest with appropriate safeguards, while others insist consent is the only viable basis for large-scale personal data ingestion.

What happened with the Italian ban on ChatGPT?

In March 2023 the Italian Garante temporarily banned ChatGPT over concerns about lawful basis, transparency, age verification, and data accuracy. OpenAI addressed these by adding disclosures, age gates, and an opt-out mechanism, and the ban was lifted within a month. The incident became a template for how other EU regulators approach AI enforcement.

Do I need a DPIA before deploying an AI system that processes personal data?

In most cases, yes. GDPR Article 35 requires a DPIA when processing is likely to result in a high risk to individuals, and AI systems that profile, score, or make automated decisions about people almost always meet that threshold.

How does GDPR Article 22 apply to AI-driven decisions?

Article 22 gives individuals the right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. Organizations using AI for credit scoring, hiring, insurance underwriting, or similar decisions must provide meaningful information about the logic involved and allow human intervention on request.

Thought LeadershipTrends & Market Insights

GDPR Compliance in 2024: How AI and LLMs Impact European User Rights

An in-depth analysis of how large language models and AI systems challenge core GDPR principles including the right to erasure, right to explanation, and lawful basis for training data, with practical guidance on DPIAs for AI deployments.

By Tyler Carbone

April 6, 2026·12 min read

The intersection of generative AI and European data protection law is one of the most consequential compliance challenges we see emerging at Agency. Large language models do not fit neatly into the framework that GDPR's architects envisioned in 2016 — they absorb vast quantities of text during training, encode patterns in billions of parameters, and produce outputs that can reproduce or recombine personal data in ways that are difficult to predict or control. For companies deploying LLMs in products that serve European users, the compliance questions are urgent and the regulatory answers are still crystallizing.

The General Data Protection Regulation was designed around a model where personal data flows through identifiable pipelines: it is collected, stored in databases, processed for defined purposes, and eventually deleted. AI systems — particularly large language models like GPT-4, Claude, Gemini, and Llama — break this model fundamentally. Personal data used in training becomes distributed across model weights in a way that cannot be individually located, retrieved, or removed. This creates tension with several core GDPR rights and obligations that every company building or deploying AI in the EU must understand.

This analysis covers the specific GDPR provisions that AI systems challenge, the evolving regulatory landscape across EU member states, and the practical steps organizations should take to deploy AI responsibly under European data protection law. For a broader overview of GDPR fundamentals, see our GDPR compliance guide.

The Right to Erasure and the Problem of Model Weights

GDPR Article 17 grants data subjects the right to have their personal data erased when it is no longer necessary for the purpose it was collected, when they withdraw consent, or when the data was unlawfully processed. This right is straightforward when data lives in a database: you locate the record and delete it. With LLMs, it is anything but straightforward.

When an LLM is trained on a dataset containing personal data — names, email addresses, biographical details, opinions expressed in public forums — that information becomes encoded across the model's parameters. There is no "row" to delete. The model has learned statistical patterns from the data, and those patterns are entangled with everything else the model learned during training.

Current Technical Approaches

Several techniques have emerged to address this gap, though none fully satisfy the Article 17 standard as traditionally interpreted:

Technique	How It Works	GDPR Adequacy
Output filtering	Post-processing layer blocks the model from generating specific personal data in responses	Partial — does not remove data from model weights
Fine-tuning / RLHF	Additional training to reduce the probability of generating specific personal information	Partial — reduces but does not eliminate the presence of data in weights
Machine unlearning	Algorithmic techniques to approximately reverse the effect of specific training data points	Promising but immature — computational cost is high and guarantees are approximate
Retraining from scratch	Remove data from training set and retrain the entire model	Technically complete but economically impractical for frontier models costing tens of millions to train
Retrieval-augmented generation (RAG)	Keep personal data in a separate, deletable database rather than in model weights	Strong for new deployments — data can be deleted from the retrieval store without touching the model

What we tell clients building AI products is that the most defensible architecture right now is one that minimizes the personal data baked into model weights and instead uses retrieval-augmented approaches where personal data lives in databases that support traditional CRUD operations. This does not solve the problem for foundation model providers who have already trained on web-scale data, but it gives application developers a compliance-friendly path.

The EDPB Position

The European Data Protection Board issued preliminary guidance in 2024 indicating that if personal data was unlawfully used during training, the resulting model may itself be considered to contain personal data — even if the data cannot be individually extracted. This has significant implications: it means a regulator could theoretically require the deletion or restriction of an entire model, not just specific outputs. The Italian Garante's enforcement action against ChatGPT, which we discuss below, hinted at this possibility.

The Right to Explanation and Automated Decision-Making

GDPR Article 22 gives individuals the right not to be subject to decisions based solely on automated processing — including profiling — that produce legal effects or similarly significant effects. Where such processing occurs, Article 13(2)(f) and Article 14(2)(g) require that organizations provide "meaningful information about the logic involved, as well as the significance and the envisaged consequences" of such processing.

For AI systems, this creates a fundamental challenge: modern deep learning models, particularly large language models, are not inherently explainable. They process inputs through billions of parameters in ways that resist simple causal explanation.

Where Article 22 Applies to AI

Not every AI deployment triggers Article 22. The provision applies when three conditions are met simultaneously:

The decision is based solely on automated processing. If a human meaningfully reviews and can override the AI output before it affects the individual, Article 22 may not apply.
The processing includes profiling or automated decision-making. This covers AI systems that evaluate personal aspects such as work performance, creditworthiness, reliability, or behavior.
The decision produces legal or similarly significant effects. Denying a loan, rejecting a job application, setting an insurance premium, or determining eligibility for a public service qualify. Personalizing a news feed generally does not.

Practical Compliance Approaches

In our experience working with companies that deploy AI in regulated contexts, the following approaches help satisfy Article 22 requirements:

Human-in-the-loop design. Structure your process so that the AI generates a recommendation, but a qualified human makes the final decision. Document the human's authority to override the AI.
Explainability layers. Use techniques like SHAP values, attention visualization, or feature importance rankings to generate explanations alongside AI outputs. These do not explain the full model, but they provide meaningful information about which factors influenced a specific decision.
Pre-deployment documentation. Before deploying an AI system that will affect individuals, document its purpose, the logic it uses, the types of data it processes, and the potential consequences of its decisions. This documentation feeds directly into your DPIA and your privacy notices.
Opt-out mechanisms. Give data subjects the ability to request human review of any automated decision, as required by Article 22(3).

The Italian Garante's ChatGPT Enforcement Action

The most significant regulatory action involving AI and GDPR to date was the Italian data protection authority's (Garante per la protezione dei dati personali) temporary ban on ChatGPT in March 2023. Understanding this case is essential because it established the template that other EU regulators are following.

The Garante's Concerns

The Garante identified four specific GDPR violations:

Lack of lawful basis. OpenAI had no valid legal basis under Article 6 for the mass collection and processing of personal data used to train ChatGPT.
Transparency failures. Users and data subjects whose data was used in training were not informed about the processing, violating Articles 13 and 14.
Inaccuracy. ChatGPT could generate factually incorrect information about individuals, violating the accuracy principle in Article 5(1)(d).
No age verification. The service lacked mechanisms to prevent minors under 13 from accessing it, despite the risks of exposing children to inappropriate content.

The Resolution

OpenAI addressed these concerns through several measures that were implemented within approximately one month:

Added a privacy notice explaining how personal data is used for training
Implemented an age verification gate requiring users to confirm they are at least 18 (or at least 13 with parental consent)
Provided an opt-out mechanism allowing users to prevent their conversations from being used for model training
Published information about how EU residents can exercise their data subject rights, including requesting data deletion
Committed to exploring legitimate interest as a lawful basis and to conducting further work on age verification

The ban was lifted in late April 2023. However, the Garante continued its investigation, and in early 2024 issued a formal finding that OpenAI had violated GDPR, imposing a fine of fifteen million euros. This enforcement trajectory — initial ban, remediation, then formal penalty — signals how other regulators are likely to approach AI companies.

Ripple Effects Across the EU

Following the Italian action, several other EU data protection authorities opened their own investigations into ChatGPT and similar AI services:

France (CNIL): Opened formal proceedings, issued guidance on AI and personal data, and established an AI department within the authority
Spain (AEPD): Launched an investigation into ChatGPT and began developing sector-specific AI guidance
Germany: Multiple state-level data protection authorities examined ChatGPT, with differing conclusions about lawful basis
Poland (UODO): Investigated after a complaint about inaccurate personal data generated by ChatGPT
European Data Protection Board: Established a ChatGPT Task Force to coordinate approaches across member states and promote consistency

The Lawful Basis Debate: Legitimate Interest vs. Consent

One of the most contested questions in AI compliance is which lawful basis under GDPR Article 6 can justify processing personal data for model training. The two candidates that receive the most attention are consent (Article 6(1)(a)) and legitimate interest (Article 6(1)(f)).

Why Consent Is Problematic for Training Data

Consent under GDPR must be freely given, specific, informed, and unambiguous. For AI training data, consent presents several practical challenges:

Scale. Foundation models are trained on billions of text documents. Obtaining individual consent from every person whose data appears in the training set is operationally impossible for most web-scraped datasets.
Specificity. Consent must be specific to the purpose. Training an AI model that could be used for countless downstream applications is difficult to describe with the specificity GDPR requires.
Withdrawal. If consent is the lawful basis, individuals can withdraw it at any time under Article 7(3). As discussed above, honoring withdrawal by removing data from trained model weights is technically problematic.

The Case for Legitimate Interest

Legitimate interest requires a three-part balancing test: (1) the controller must have a legitimate interest, (2) the processing must be necessary for that interest, and (3) the interest must not be overridden by the data subject's fundamental rights and freedoms.

Several EU regulators have indicated that legitimate interest may be acceptable for AI training, provided that:

A thorough balancing test is documented
The data was obtained from publicly accessible sources
Appropriate safeguards are in place (output filters, opt-out mechanisms, data minimization during training)
A DPIA has been conducted
Transparent information is provided to data subjects

What we advise clients is that legitimate interest is the more practical basis for most AI training scenarios, but it requires significantly more documentation work than many organizations anticipate. The balancing test must be genuine, detailed, and regularly reviewed — not a box-checking exercise.

Data Protection Impact Assessments for AI Deployments

GDPR Article 35 requires a DPIA when processing is "likely to result in a high risk to the rights and freedoms of natural persons." AI systems that process personal data almost always meet this threshold, particularly when they involve:

Systematic evaluation of personal aspects (profiling)
Automated decision-making with legal or significant effects
Large-scale processing of personal data
New technologies (which AI inherently represents)

What a DPIA for an AI System Should Cover

Based on our experience helping companies prepare DPIAs for AI deployments, we recommend the following structure:

1. Description of processing operations

What data does the AI system process?
Where does the training data come from?
What is the model architecture and how does it use personal data?
What outputs does the system generate, and who receives them?

2. Assessment of necessity and proportionality

Is AI processing necessary to achieve the stated purpose, or could a less intrusive approach work?
Is the volume of personal data processed proportionate to the purpose?
Have you applied data minimization techniques (anonymization, pseudonymization, differential privacy)?

3. Risk assessment

What are the risks to individuals? Consider discrimination, inaccuracy, loss of autonomy, chilling effects on expression, and data breaches.
What is the likelihood and severity of each risk?
Are there risks specific to vulnerable groups (children, elderly, minorities)?

4. Mitigation measures

Technical controls: output filtering, access controls, audit logging, encryption
Organizational controls: human oversight, regular model auditing, bias testing, incident response procedures
Data subject controls: opt-out mechanisms, correction procedures, human review processes

5. Consultation requirements

If residual risk remains high after mitigation, Article 36 requires prior consultation with the supervisory authority
Document whether consultation was required and its outcome

Connecting DPIAs to Your ROPA

Your DPIA should reference and be consistent with your Record of Processing Activities (ROPA). The ROPA documents what processing activities your organization conducts, while the DPIA evaluates the risk of specific high-risk activities. In practice, the AI system should appear as a processing activity in your ROPA, and the DPIA should elaborate on the risks and safeguards specific to that activity.

The EU AI Act and Its Intersection with GDPR

The EU AI Act, which entered into force in August 2024 with phased implementation through 2027, creates an additional regulatory layer for AI systems operating in the EU. While it is a distinct regulation from GDPR, the two interact in several important ways.

Key Intersections

Area	GDPR Requirement	AI Act Requirement	Combined Obligation
Transparency	Privacy notices, information about automated decisions	Disclosure that content is AI-generated; transparency about system capabilities	Both sets of disclosures required
Risk assessment	DPIA for high-risk processing	Conformity assessment for high-risk AI systems	Assessments may be combined but must meet both standards
Data governance	Lawful basis, data minimization, accuracy	Training data quality requirements, bias monitoring	AI Act adds specific data quality obligations beyond GDPR minimization
Record-keeping	ROPA under Article 30	Technical documentation and logging for high-risk systems	AI Act logging requirements are more detailed than ROPA
Human oversight	Right to human intervention in automated decisions	Mandatory human oversight for high-risk AI systems	AI Act formalizes and extends the Article 22 principle

Organizations deploying AI in Europe should prepare for compliance with both frameworks simultaneously. In our experience, companies that build a unified compliance program covering GDPR and the AI Act from the start avoid significant rework later.

Practical Steps for Organizations Deploying AI in the EU

Based on our work with companies navigating AI compliance under GDPR, we recommend the following action plan:

Audit your training data. Understand what personal data is included in your training datasets, where it came from, and what lawful basis supports its use.
Conduct a DPIA before deployment. Do not treat the DPIA as a post-hoc exercise. Complete it before your AI system goes live.
Implement output controls. Deploy filtering and monitoring to prevent the model from generating personal data inappropriately.
Build human oversight into your process. For any AI system that affects individuals, ensure a qualified human can review and override AI decisions.
Update your privacy notices. Clearly explain to users how AI is used in your products and what personal data it processes.
Create opt-out mechanisms. Allow users to prevent their data from being used for model training and to request human alternatives to AI-driven decisions.
Update your ROPA. Ensure your AI processing activities are documented in your Record of Processing Activities with appropriate detail.
Monitor regulatory developments. EU guidance on AI and GDPR is evolving rapidly. Assign someone on your team to track developments from the EDPB, national regulators, and the EU AI Office.

For companies managing GDPR alongside other frameworks like SOC 2, see our guide on managing SOC 2 and GDPR together — many of the technical controls required for AI compliance overlap with both frameworks.

Looking Ahead

The relationship between AI and GDPR will continue to evolve as regulators gain experience with enforcement and as technical solutions for challenges like machine unlearning mature. What we consistently tell clients is that the companies in the strongest position are those that treat GDPR compliance as a design constraint from the beginning of their AI development process — not as a legal problem to solve after the model is trained and deployed.

The regulatory direction is clear: the EU expects AI developers and deployers to respect the data protection rights that GDPR establishes, even when the technology makes those rights more difficult to implement. Organizations that invest in privacy-by-design for their AI systems now will have a significant competitive and legal advantage as enforcement intensifies.

Frequently Asked Questions

Tyler Carbone

Managing Director and Cofounder

Tyler Carbone is a Managing Director and Cofounder of Agency and one of the industry's leading voices on governance, risk, and compliance. He holds degrees from Harvard and a JD/MBA from the University of Virginia, and previously worked in cybersecurity at Deloitte. Tyler has helped hundreds of companies operate SOC 2, ISO 27001, HIPAA, and GDPR programs.

GDPR Compliance in 2024: How AI and LLMs Impact European User Rights