GDPR Compliance in 2024: How AI and LLMs Impact European User Rights
An in-depth analysis of how large language models and AI systems challenge core GDPR principles including the right to erasure, right to explanation, and lawful basis for training data, with practical guidance on DPIAs for AI deployments.
The intersection of generative AI and European data protection law is one of the most consequential compliance challenges we see emerging at Agency. Large language models do not fit neatly into the framework that GDPR's architects envisioned in 2016 — they absorb vast quantities of text during training, encode patterns in billions of parameters, and produce outputs that can reproduce or recombine personal data in ways that are difficult to predict or control. For companies deploying LLMs in products that serve European users, the compliance questions are urgent and the regulatory answers are still crystallizing.
The General Data Protection Regulation was designed around a model where personal data flows through identifiable pipelines: it is collected, stored in databases, processed for defined purposes, and eventually deleted. AI systems — particularly large language models like GPT-4, Claude, Gemini, and Llama — break this model fundamentally. Personal data used in training becomes distributed across model weights in a way that cannot be individually located, retrieved, or removed. This creates tension with several core GDPR rights and obligations that every company building or deploying AI in the EU must understand.
This analysis covers the specific GDPR provisions that AI systems challenge, the evolving regulatory landscape across EU member states, and the practical steps organizations should take to deploy AI responsibly under European data protection law. For a broader overview of GDPR fundamentals, see our GDPR compliance guide.
The Right to Erasure and the Problem of Model Weights
GDPR Article 17 grants data subjects the right to have their personal data erased when it is no longer necessary for the purpose it was collected, when they withdraw consent, or when the data was unlawfully processed. This right is straightforward when data lives in a database: you locate the record and delete it. With LLMs, it is anything but straightforward.
When an LLM is trained on a dataset containing personal data — names, email addresses, biographical details, opinions expressed in public forums — that information becomes encoded across the model's parameters. There is no "row" to delete. The model has learned statistical patterns from the data, and those patterns are entangled with everything else the model learned during training.
Current Technical Approaches
Several techniques have emerged to address this gap, though none fully satisfy the Article 17 standard as traditionally interpreted:
| Technique | How It Works | GDPR Adequacy |
|---|---|---|
| Output filtering | Post-processing layer blocks the model from generating specific personal data in responses | Partial — does not remove data from model weights |
| Fine-tuning / RLHF | Additional training to reduce the probability of generating specific personal information | Partial — reduces but does not eliminate the presence of data in weights |
| Machine unlearning | Algorithmic techniques to approximately reverse the effect of specific training data points | Promising but immature — computational cost is high and guarantees are approximate |
| Retraining from scratch | Remove data from training set and retrain the entire model | Technically complete but economically impractical for frontier models costing tens of millions to train |
| Retrieval-augmented generation (RAG) | Keep personal data in a separate, deletable database rather than in model weights | Strong for new deployments — data can be deleted from the retrieval store without touching the model |
What we tell clients building AI products is that the most defensible architecture right now is one that minimizes the personal data baked into model weights and instead uses retrieval-augmented approaches where personal data lives in databases that support traditional CRUD operations. This does not solve the problem for foundation model providers who have already trained on web-scale data, but it gives application developers a compliance-friendly path.
The EDPB Position
The European Data Protection Board issued preliminary guidance in 2024 indicating that if personal data was unlawfully used during training, the resulting model may itself be considered to contain personal data — even if the data cannot be individually extracted. This has significant implications: it means a regulator could theoretically require the deletion or restriction of an entire model, not just specific outputs. The Italian Garante's enforcement action against ChatGPT, which we discuss below, hinted at this possibility.
The Right to Explanation and Automated Decision-Making
GDPR Article 22 gives individuals the right not to be subject to decisions based solely on automated processing — including profiling — that produce legal effects or similarly significant effects. Where such processing occurs, Article 13(2)(f) and Article 14(2)(g) require that organizations provide "meaningful information about the logic involved, as well as the significance and the envisaged consequences" of such processing.
For AI systems, this creates a fundamental challenge: modern deep learning models, particularly large language models, are not inherently explainable. They process inputs through billions of parameters in ways that resist simple causal explanation.
Where Article 22 Applies to AI
Not every AI deployment triggers Article 22. The provision applies when three conditions are met simultaneously:
- The decision is based solely on automated processing. If a human meaningfully reviews and can override the AI output before it affects the individual, Article 22 may not apply.
- The processing includes profiling or automated decision-making. This covers AI systems that evaluate personal aspects such as work performance, creditworthiness, reliability, or behavior.
- The decision produces legal or similarly significant effects. Denying a loan, rejecting a job application, setting an insurance premium, or determining eligibility for a public service qualify. Personalizing a news feed generally does not.
Practical Compliance Approaches
In our experience working with companies that deploy AI in regulated contexts, the following approaches help satisfy Article 22 requirements:
- Human-in-the-loop design. Structure your process so that the AI generates a recommendation, but a qualified human makes the final decision. Document the human's authority to override the AI.
- Explainability layers. Use techniques like SHAP values, attention visualization, or feature importance rankings to generate explanations alongside AI outputs. These do not explain the full model, but they provide meaningful information about which factors influenced a specific decision.
- Pre-deployment documentation. Before deploying an AI system that will affect individuals, document its purpose, the logic it uses, the types of data it processes, and the potential consequences of its decisions. This documentation feeds directly into your DPIA and your privacy notices.
- Opt-out mechanisms. Give data subjects the ability to request human review of any automated decision, as required by Article 22(3).
The Italian Garante's ChatGPT Enforcement Action
The most significant regulatory action involving AI and GDPR to date was the Italian data protection authority's (Garante per la protezione dei dati personali) temporary ban on ChatGPT in March 2023. Understanding this case is essential because it established the template that other EU regulators are following.
The Garante's Concerns
The Garante identified four specific GDPR violations:
- Lack of lawful basis. OpenAI had no valid legal basis under Article 6 for the mass collection and processing of personal data used to train ChatGPT.
- Transparency failures. Users and data subjects whose data was used in training were not informed about the processing, violating Articles 13 and 14.
- Inaccuracy. ChatGPT could generate factually incorrect information about individuals, violating the accuracy principle in Article 5(1)(d).
- No age verification. The service lacked mechanisms to prevent minors under 13 from accessing it, despite the risks of exposing children to inappropriate content.
The Resolution
OpenAI addressed these concerns through several measures that were implemented within approximately one month:
- Added a privacy notice explaining how personal data is used for training
- Implemented an age verification gate requiring users to confirm they are at least 18 (or at least 13 with parental consent)
- Provided an opt-out mechanism allowing users to prevent their conversations from being used for model training
- Published information about how EU residents can exercise their data subject rights, including requesting data deletion
- Committed to exploring legitimate interest as a lawful basis and to conducting further work on age verification
The ban was lifted in late April 2023. However, the Garante continued its investigation, and in early 2024 issued a formal finding that OpenAI had violated GDPR, imposing a fine of fifteen million euros. This enforcement trajectory — initial ban, remediation, then formal penalty — signals how other regulators are likely to approach AI companies.
Ripple Effects Across the EU
Following the Italian action, several other EU data protection authorities opened their own investigations into ChatGPT and similar AI services:
- France (CNIL): Opened formal proceedings, issued guidance on AI and personal data, and established an AI department within the authority
- Spain (AEPD): Launched an investigation into ChatGPT and began developing sector-specific AI guidance
- Germany: Multiple state-level data protection authorities examined ChatGPT, with differing conclusions about lawful basis
- Poland (UODO): Investigated after a complaint about inaccurate personal data generated by ChatGPT
- European Data Protection Board: Established a ChatGPT Task Force to coordinate approaches across member states and promote consistency
The Lawful Basis Debate: Legitimate Interest vs. Consent
One of the most contested questions in AI compliance is which lawful basis under GDPR Article 6 can justify processing personal data for model training. The two candidates that receive the most attention are consent (Article 6(1)(a)) and legitimate interest (Article 6(1)(f)).
Why Consent Is Problematic for Training Data
Consent under GDPR must be freely given, specific, informed, and unambiguous. For AI training data, consent presents several practical challenges:
- Scale. Foundation models are trained on billions of text documents. Obtaining individual consent from every person whose data appears in the training set is operationally impossible for most web-scraped datasets.
- Specificity. Consent must be specific to the purpose. Training an AI model that could be used for countless downstream applications is difficult to describe with the specificity GDPR requires.
- Withdrawal. If consent is the lawful basis, individuals can withdraw it at any time under Article 7(3). As discussed above, honoring withdrawal by removing data from trained model weights is technically problematic.
The Case for Legitimate Interest
Legitimate interest requires a three-part balancing test: (1) the controller must have a legitimate interest, (2) the processing must be necessary for that interest, and (3) the interest must not be overridden by the data subject's fundamental rights and freedoms.
Several EU regulators have indicated that legitimate interest may be acceptable for AI training, provided that:
- A thorough balancing test is documented
- The data was obtained from publicly accessible sources
- Appropriate safeguards are in place (output filters, opt-out mechanisms, data minimization during training)
- A DPIA has been conducted
- Transparent information is provided to data subjects
What we advise clients is that legitimate interest is the more practical basis for most AI training scenarios, but it requires significantly more documentation work than many organizations anticipate. The balancing test must be genuine, detailed, and regularly reviewed — not a box-checking exercise.
Data Protection Impact Assessments for AI Deployments
GDPR Article 35 requires a DPIA when processing is "likely to result in a high risk to the rights and freedoms of natural persons." AI systems that process personal data almost always meet this threshold, particularly when they involve:
- Systematic evaluation of personal aspects (profiling)
- Automated decision-making with legal or significant effects
- Large-scale processing of personal data
- New technologies (which AI inherently represents)
What a DPIA for an AI System Should Cover
Based on our experience helping companies prepare DPIAs for AI deployments, we recommend the following structure:
1. Description of processing operations
- What data does the AI system process?
- Where does the training data come from?
- What is the model architecture and how does it use personal data?
- What outputs does the system generate, and who receives them?
2. Assessment of necessity and proportionality
- Is AI processing necessary to achieve the stated purpose, or could a less intrusive approach work?
- Is the volume of personal data processed proportionate to the purpose?
- Have you applied data minimization techniques (anonymization, pseudonymization, differential privacy)?
3. Risk assessment
- What are the risks to individuals? Consider discrimination, inaccuracy, loss of autonomy, chilling effects on expression, and data breaches.
- What is the likelihood and severity of each risk?
- Are there risks specific to vulnerable groups (children, elderly, minorities)?
4. Mitigation measures
- Technical controls: output filtering, access controls, audit logging, encryption
- Organizational controls: human oversight, regular model auditing, bias testing, incident response procedures
- Data subject controls: opt-out mechanisms, correction procedures, human review processes
5. Consultation requirements
- If residual risk remains high after mitigation, Article 36 requires prior consultation with the supervisory authority
- Document whether consultation was required and its outcome
Connecting DPIAs to Your ROPA
Your DPIA should reference and be consistent with your Record of Processing Activities (ROPA). The ROPA documents what processing activities your organization conducts, while the DPIA evaluates the risk of specific high-risk activities. In practice, the AI system should appear as a processing activity in your ROPA, and the DPIA should elaborate on the risks and safeguards specific to that activity.
The EU AI Act and Its Intersection with GDPR
The EU AI Act, which entered into force in August 2024 with phased implementation through 2027, creates an additional regulatory layer for AI systems operating in the EU. While it is a distinct regulation from GDPR, the two interact in several important ways.
Key Intersections
| Area | GDPR Requirement | AI Act Requirement | Combined Obligation |
|---|---|---|---|
| Transparency | Privacy notices, information about automated decisions | Disclosure that content is AI-generated; transparency about system capabilities | Both sets of disclosures required |
| Risk assessment | DPIA for high-risk processing | Conformity assessment for high-risk AI systems | Assessments may be combined but must meet both standards |
| Data governance | Lawful basis, data minimization, accuracy | Training data quality requirements, bias monitoring | AI Act adds specific data quality obligations beyond GDPR minimization |
| Record-keeping | ROPA under Article 30 | Technical documentation and logging for high-risk systems | AI Act logging requirements are more detailed than ROPA |
| Human oversight | Right to human intervention in automated decisions | Mandatory human oversight for high-risk AI systems | AI Act formalizes and extends the Article 22 principle |
Organizations deploying AI in Europe should prepare for compliance with both frameworks simultaneously. In our experience, companies that build a unified compliance program covering GDPR and the AI Act from the start avoid significant rework later.
Practical Steps for Organizations Deploying AI in the EU
Based on our work with companies navigating AI compliance under GDPR, we recommend the following action plan:
- Audit your training data. Understand what personal data is included in your training datasets, where it came from, and what lawful basis supports its use.
- Conduct a DPIA before deployment. Do not treat the DPIA as a post-hoc exercise. Complete it before your AI system goes live.
- Implement output controls. Deploy filtering and monitoring to prevent the model from generating personal data inappropriately.
- Build human oversight into your process. For any AI system that affects individuals, ensure a qualified human can review and override AI decisions.
- Update your privacy notices. Clearly explain to users how AI is used in your products and what personal data it processes.
- Create opt-out mechanisms. Allow users to prevent their data from being used for model training and to request human alternatives to AI-driven decisions.
- Update your ROPA. Ensure your AI processing activities are documented in your Record of Processing Activities with appropriate detail.
- Monitor regulatory developments. EU guidance on AI and GDPR is evolving rapidly. Assign someone on your team to track developments from the EDPB, national regulators, and the EU AI Office.
For companies managing GDPR alongside other frameworks like SOC 2, see our guide on managing SOC 2 and GDPR together — many of the technical controls required for AI compliance overlap with both frameworks.
Looking Ahead
The relationship between AI and GDPR will continue to evolve as regulators gain experience with enforcement and as technical solutions for challenges like machine unlearning mature. What we consistently tell clients is that the companies in the strongest position are those that treat GDPR compliance as a design constraint from the beginning of their AI development process — not as a legal problem to solve after the model is trained and deployed.
The regulatory direction is clear: the EU expects AI developers and deployers to respect the data protection rights that GDPR establishes, even when the technology makes those rights more difficult to implement. Organizations that invest in privacy-by-design for their AI systems now will have a significant competitive and legal advantage as enforcement intensifies.
Frequently Asked Questions
Agency Team
Agency Insights
Expert guidance on cybersecurity compliance from Agency's advisory team.
LinkedIn