From e57e9b7d168065bfbef3a6de88b316921a575cd2 Mon Sep 17 00:00:00 2001 From: Mikael Krief Date: Mon, 30 Mar 2026 22:01:08 +0200 Subject: [PATCH 1/4] feat: add GDPR-compliant engineering practices skill documentation --- docs/README.skills.md | 1 + skills/gdpr-compliant/SKILL.md | 730 +++++++++++++++++++++++++++++++++ 2 files changed, 731 insertions(+) create mode 100644 skills/gdpr-compliant/SKILL.md diff --git a/docs/README.skills.md b/docs/README.skills.md index 7d12e8b75..829ce7924 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -133,6 +133,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [fluentui-blazor](../skills/fluentui-blazor/SKILL.md) | Guide for using the Microsoft Fluent UI Blazor component library (Microsoft.FluentUI.AspNetCore.Components NuGet package) in Blazor applications. Use this when the user is building a Blazor app with Fluent UI components, setting up the library, using FluentUI components like FluentButton, FluentDataGrid, FluentDialog, FluentToast, FluentNavMenu, FluentTextField, FluentSelect, FluentAutocomplete, FluentDesignTheme, or any component prefixed with "Fluent". Also use when troubleshooting missing providers, JS interop issues, or theming. | `references/DATAGRID.md`
`references/LAYOUT-AND-NAVIGATION.md`
`references/SETUP.md`
`references/THEMING.md` | | [folder-structure-blueprint-generator](../skills/folder-structure-blueprint-generator/SKILL.md) | Comprehensive technology-agnostic prompt for analyzing and documenting project folder structures. Auto-detects project types (.NET, Java, React, Angular, Python, Node.js, Flutter), generates detailed blueprints with visualization options, naming conventions, file placement patterns, and extension templates for maintaining consistent code organization across diverse technology stacks. | None | | [game-engine](../skills/game-engine/SKILL.md) | Expert skill for building web-based game engines and games using HTML5, Canvas, WebGL, and JavaScript. Use when asked to create games, build game engines, implement game physics, handle collision detection, set up game loops, manage sprites, add game controls, or work with 2D/3D rendering. Covers techniques for platformers, breakout-style games, maze games, tilemaps, audio, multiplayer via WebRTC, and publishing games. | `assets/2d-maze-game.md`
`assets/2d-platform-game.md`
`assets/gameBase-template-repo.md`
`assets/paddle-game-template.md`
`assets/simple-2d-engine.md`
`references/3d-web-games.md`
`references/algorithms.md`
`references/basics.md`
`references/game-control-mechanisms.md`
`references/game-engine-core-principles.md`
`references/game-publishing.md`
`references/techniques.md`
`references/terminology.md`
`references/web-apis.md` | +| [gdpr-compliant](../skills/gdpr-compliant/SKILL.md) | Apply GDPR-compliant engineering practices across your codebase. Use this skill whenever you are designing APIs, writing data models, building authentication flows, implementing logging, handling user data, writing retention/deletion jobs, designing cloud infrastructure, or reviewing pull requests for privacy compliance. Trigger this skill for any task involving personal data, user accounts, cookies, analytics, emails, audit logs, encryption, pseudonymization, anonymization, data exports, breach response, CI/CD pipelines that process real data, or any question framed as "is this GDPR-compliant?". Inspired by CNIL developer guidance and GDPR Articles 5, 25, 32, 33, 35. | None | | [gen-specs-as-issues](../skills/gen-specs-as-issues/SKILL.md) | This workflow guides you through a systematic approach to identify missing features, prioritize them, and create detailed specifications for implementation. | None | | [generate-custom-instructions-from-codebase](../skills/generate-custom-instructions-from-codebase/SKILL.md) | Migration and code evolution instructions generator for GitHub Copilot. Analyzes differences between two project versions (branches, commits, or releases) to create precise instructions allowing Copilot to maintain consistency during technology migrations, major refactoring, or framework version upgrades. | None | | [geofeed-tuner](../skills/geofeed-tuner/SKILL.md) | Use this skill whenever the user mentions IP geolocation feeds, RFC 8805, geofeeds, or wants help creating, tuning, validating, or publishing a self-published IP geolocation feed in CSV format. Intended user audience is a network operator, ISP, mobile carrier, cloud provider, hosting company, IXP, or satellite provider asking about IP geolocation accuracy, or geofeed authoring best practices. Helps create, refine, and improve CSV-format IP geolocation feeds with opinionated recommendations beyond RFC 8805 compliance. Do NOT use for private or internal IP address management — applies only to publicly routable IP addresses. | `assets/example`
`assets/iso3166-1.json`
`assets/iso3166-2.json`
`assets/small-territories.json`
`references/rfc8805.txt`
`references/snippets-python3.md`
`scripts/templates` | diff --git a/skills/gdpr-compliant/SKILL.md b/skills/gdpr-compliant/SKILL.md new file mode 100644 index 000000000..ff6d9b7fd --- /dev/null +++ b/skills/gdpr-compliant/SKILL.md @@ -0,0 +1,730 @@ +--- +name: gdpr-compliant +description: 'Apply GDPR-compliant engineering practices across your codebase. Use this skill + whenever you are designing APIs, writing data models, building authentication flows, + implementing logging, handling user data, writing retention/deletion jobs, designing + cloud infrastructure, or reviewing pull requests for privacy compliance. + Trigger this skill for any task involving personal data, user accounts, cookies, + analytics, emails, audit logs, encryption, pseudonymization, anonymization, + data exports, breach response, CI/CD pipelines that process real data, or any + question framed as "is this GDPR-compliant?". Inspired by CNIL developer guidance + and GDPR Articles 5, 25, 32, 33, 35.' +--- + +# GDPR Engineering Skill + +A comprehensive, actionable reference for engineers, architects, DevOps engineers, +and tech leads building GDPR-compliant software in the EU/EEA or handling data of +EU residents. + +> **Golden Rule — commit this to memory:** +> **Collect less. Store less. Expose less. Retain less.** + +--- + +## Table of Contents + +1. [Core GDPR Principles](#1-core-gdpr-principles) +2. [Privacy by Design & by Default](#2-privacy-by-design--by-default) +3. [Data Minimization](#3-data-minimization) +4. [Purpose Limitation](#4-purpose-limitation) +5. [Storage Limitation & Retention Policies](#5-storage-limitation--retention-policies) +6. [Integrity & Confidentiality](#6-integrity--confidentiality) +7. [Accountability & Records of Processing](#7-accountability--records-of-processing) +8. [User Rights Implementation](#8-user-rights-implementation) +9. [API Design Rules](#9-api-design-rules) +10. [Logging Rules](#10-logging-rules) +11. [Error Handling](#11-error-handling) +12. [Encryption](#12-encryption) +13. [Password Hashing](#13-password-hashing) +14. [Secrets Management](#14-secrets-management) +15. [Anonymization & Pseudonymization](#15-anonymization--pseudonymization) +16. [Testing with Fake Data](#16-testing-with-fake-data) +17. [Incident & Breach Handling](#17-incident--breach-handling) +18. [Cloud & DevOps Practices](#18-cloud--devops-practices) +19. [CI/CD Controls](#19-cicd-controls) +20. [Architecture Patterns](#20-architecture-patterns) +21. [Anti-Patterns](#21-anti-patterns) +22. [PR Review Checklist](#22-pr-review-checklist) + +--- + +## 1. Core GDPR Principles + +These seven principles (Article 5 GDPR) are the foundation of every engineering decision. + +| Principle | Engineering meaning | +|---|---| +| **Lawfulness, fairness, transparency** | Have a documented legal basis for every processing activity. Expose privacy notices in the UI. | +| **Purpose limitation** | Data collected for purpose A MUST NOT be silently reused for purpose B without a new legal basis. | +| **Data minimization** | Collect only the fields you actually need today. Delete the rest. | +| **Accuracy** | Provide update endpoints. Propagate corrections to downstream stores. | +| **Storage limitation** | Define a TTL at the moment you design the schema, not after. | +| **Integrity & confidentiality** | Encrypt at rest and in transit. Restrict access. Audit access to sensitive data. | +| **Accountability** | Maintain documented evidence that you comply. DPA-ready at any time. | + +--- + +## 2. Privacy by Design & by Default + +**Privacy by Design** means privacy is an architectural requirement, not a retrofit. +**Privacy by Default** means the most privacy-preserving option is always the default. + +### MUST + +- Design data models with retention in mind from day one — add `CreatedAt`, `DeletedAt`, `RetentionExpiresAt` columns when the entity is first created. +- Default all optional data collection to **off**. Users opt in; they do not opt out. +- Make the least-privileged access path the default API behavior. +- Conduct a **Data Protection Impact Assessment (DPIA)** before building any high-risk processing (biometrics, large-scale profiling, health data, systematic monitoring). +- Document processing activities in a **Record of Processing Activities (RoPA)** — update it with every new feature. + +### SHOULD + +- Use feature flags to allow disabling data collection without a deployment. +- Apply column-level encryption for sensitive fields (health, financial, SSN, biometrics) rather than relying on disk encryption alone. +- Design for soft-delete + scheduled hard-delete, not immediate hard-delete, to allow data subject request windows. + +### MUST NOT + +- MUST NOT ship a new data collection feature without a documented legal basis. +- MUST NOT enable analytics, tracking, or telemetry by default without explicit consent. +- MUST NOT store personal data in a system not listed in the RoPA. + +--- + +## 3. Data Minimization + +### MUST + +- Map every field in every DTO/model to a concrete business need. Remove fields with no documented use. +- In API responses, return only what the client actually needs. Never return full entity objects when a projection suffices. +- Truncate or mask data at the edge — e.g., return `****1234` for card numbers, not the full PAN. +- In search/list endpoints, exclude sensitive fields (date of birth, national ID, health data) from default projections. + +### SHOULD + +- Use separate DTOs for create, read, and update operations — never reuse the same object and accidentally expose fields. +- Add automated tests that assert sensitive fields are absent from API responses where they should not appear. + +### MUST NOT + +- MUST NOT log full request/response bodies if they may contain personal data. +- MUST NOT include personal data in URL path segments or query parameters (they end up in access logs, CDN logs, and browser history). +- MUST NOT collect `dateOfBirth`, national ID, or health data unless there is an explicit, documented business requirement and a legal basis. + +--- + +## 4. Purpose Limitation + +### MUST + +- Document the purpose of every processing activity in code comments and in the RoPA. +- Tag database columns with their purpose in migration scripts or schema documentation. +- When reusing data for a secondary purpose (e.g., fraud detection reusing transactional data), obtain a new legal basis or confirm compatibility analysis. + +### SHOULD + +- Implement **data purpose tags** as metadata in your data warehouse/lake so downstream pipelines cannot silently extend usage. +- Build separate data stores for separate purposes (e.g., marketing analytics must not read from production operational data directly). + +### MUST NOT + +- MUST NOT share personal data collected for service delivery with third-party advertising networks without explicit consent. +- MUST NOT use support ticket content to train ML models without a separate legal basis and user notice. + +--- + +## 5. Storage Limitation & Retention Policies + +### MUST + +- Every table or store that holds personal data MUST have a defined retention period. +- Implement a scheduled job (e.g., Hangfire, cron) that enforces retention — not a manual process. +- Distinguish between **anonymization** (data may remain) and **deletion** (data is gone). Choose deliberately. +- Archive or anonymize data when retention expires — never leave expired data silently in production. +- Document retention periods in a **Retention Policy** document linked from the RoPA. + +### Recommended Retention Defaults + +| Data type | Suggested maximum retention | +|---|---| +| Authentication logs | 12 months | +| Audit logs | 12–24 months (legal requirements may extend this) | +| Session tokens / refresh tokens | 30–90 days | +| Email / notification logs | 6 months | +| User accounts (inactive) | 12 months after last login, then notify + delete | +| Payment records | As required by tax law (typically 7–10 years), but minimized | +| Support tickets | 3 years after closure | +| Analytics events | 13 months (standard GA-style) | + +### SHOULD + +- Add a `RetentionExpiresAt` column to every sensitive table — compute it at insert time. +- Use soft-delete (`DeletedAt`) with a scheduled hard-delete job after the GDPR erasure request window (30 days). + +### MUST NOT + +- MUST NOT retain personal data indefinitely "in case it becomes useful later." +- MUST NOT use production data as a long-term data lake without a retention enforcement mechanism. + +--- + +## 6. Integrity & Confidentiality + +### MUST + +- Enforce **TLS 1.2+** on all connections. Reject older protocols. +- Encrypt personal data **at rest** using AES-256 or equivalent. +- Use **column-level encryption** for highly sensitive fields (health, biometric, financial, national ID). +- Restrict database access by role — application user MUST NOT have DDL rights on the production database. +- Enforce the **principle of least privilege** on all IAM roles, service accounts, and API keys. +- Enable access logging on databases and object storage. Retain access logs per retention policy. + +### SHOULD + +- Use **envelope encryption**: data encrypted with a data encryption key (DEK) which is itself encrypted by a key encryption key (KEK) stored in a KMS (Azure Key Vault, AWS KMS, GCP Cloud KMS). +- Enable **automatic key rotation** (annually minimum). +- Use **network segmentation**: databases must not be publicly accessible. Use private endpoints / VPC peering. +- Enable **audit logging** at the database level for SELECT on sensitive tables. + +### MUST NOT + +- MUST NOT store secrets (API keys, connection strings, passwords) in source code, configuration files committed to Git, or environment variable defaults. +- MUST NOT use self-signed certificates in production. +- MUST NOT transmit personal data over HTTP. + +--- + +## 7. Accountability & Records of Processing + +### MUST + +- Maintain a **Record of Processing Activities (RoPA)** — a living document updated with every new feature. Minimum fields per activity: + - Name and purpose + - Legal basis (contract / legitimate interest / consent / legal obligation / vital interest / public task) + - Categories of data subjects + - Categories of personal data + - Recipients (third parties, sub-processors) + - Transfers outside EEA and safeguards + - Retention period + - Security measures + +- Maintain a list of all **sub-processors** (cloud providers, SaaS tools, analytics, email providers). Review annually. +- Sign **Data Processing Agreements (DPAs)** with every sub-processor before data flows to them. + +### SHOULD + +- Generate a machine-readable RoPA (YAML/JSON) alongside the human-readable version, so it can be version-controlled. +- Automate a quarterly reminder to review the RoPA and sub-processor list. + +### MUST NOT + +- MUST NOT onboard a new SaaS tool that processes personal data without a signed DPA and RoPA entry. + +--- + +## 8. User Rights Implementation + +GDPR grants data subjects the following rights. Each must have a technical implementation path. + +| Right | Engineering implementation | +|---|---| +| **Right of access (Art. 15)** | `GET /api/v1/me/data-export` — returns all personal data in a machine-readable format (JSON or CSV). Respond within 30 days. | +| **Right to rectification (Art. 16)** | `PUT /api/v1/me/profile` — allow users to update their data. Propagate changes to downstream stores (search index, data warehouse). | +| **Right to erasure / right to be forgotten (Art. 17)** | `DELETE /api/v1/me` — anonymize or delete all personal data. Implement a checklist of all stores to scrub. | +| **Right to restriction of processing (Art. 18)** | Add a `ProcessingRestricted` flag on the user record. Gate all non-essential processing behind this flag. | +| **Right to data portability (Art. 20)** | Same as access endpoint, but ensure the format is structured, commonly used, and machine-readable (JSON preferred). | +| **Right to object (Art. 21)** | Provide an opt-out mechanism for processing based on legitimate interest. Honor it immediately — do not defer. | +| **Rights related to automated decision-making (Art. 22)** | If automated decisions produce legal or significant effects, provide a human review path. Expose an explanation of the logic. | + +### MUST + +- Every right MUST have a tested API endpoint (or admin back-office process) before the system goes live. +- Erasure MUST be comprehensive — document every store where a user's data lives (DB, S3, search index, cache, email logs, CDN logs, analytics). +- Respond to verified data subject requests within **30 calendar days**. +- Provide a machine-readable data export — not a PDF screenshot. + +### SHOULD + +- Build a **Data Subject Request (DSR) tracker** — a back-office tool to manage incoming requests, deadlines, and completion status. +- Automate the erasure pipeline for primary stores; document the manual steps for third-party stores. +- Test erasure with integration tests that assert the user's data is absent from all stores after deletion. + +### MUST NOT + +- MUST NOT require users to contact support via phone or letter to exercise their rights if the product is digital. +- MUST NOT charge a fee for data access requests unless clearly abusive and excessive. + +--- + +## 9. API Design Rules + +### MUST + +- MUST NOT include personal data in URL path or query parameters. + - ❌ `GET /users/john.doe@example.com` + - ✅ `GET /users/{userId}` +- Authenticate all endpoints that return or accept personal data. +- Enforce **RBAC or ABAC** — users MUST NOT be able to access another user's data by guessing IDs (IDOR prevention). + - Always extract the acting user's identity from the JWT/session, never from the request body. + - Validate ownership: `if (resource.OwnerId != currentUserId) return 403`. +- Version your API — breaking privacy changes require a new version with migration guidance. +- Return **only the fields the caller is authorized to see**. Use response projections. + +### SHOULD + +- Implement **rate limiting** on sensitive endpoints (login, data export, password reset) to prevent enumeration and abuse. +- Add a `Content-Security-Policy` header on all responses. +- Use `Referrer-Policy: no-referrer` or `strict-origin` to prevent personal data leaking in referer headers. +- Implement **CORS** with an explicit allowlist. Never use `Access-Control-Allow-Origin: *` on authenticated APIs. + +### MUST NOT + +- MUST NOT return stack traces, internal paths, or database error messages in API error responses. +- MUST NOT use predictable sequential integer IDs as public resource identifiers — use UUIDs or opaque identifiers. +- MUST NOT expose bulk export endpoints without authentication and rate limiting. + +--- + +## 10. Logging Rules + +### MUST + +- **Anonymize IPs** in application logs — mask the last octet (IPv4) or the last 80 bits (IPv6). + - ❌ `192.168.1.42` + - ✅ `192.168.1.xxx` +- MUST NOT log passwords, tokens, session IDs, or authentication credentials. +- MUST NOT log full request or response bodies if they may contain personal data (forms, profile updates, health data). +- MUST NOT log national identification numbers, payment card numbers, or health data. +- Apply log retention — purge logs automatically after the defined retention period. + +### SHOULD + +- Log **events** rather than data: `"User {UserId} updated email"` not `"Email changed from a@b.com to c@d.com"`. +- Hash or pseudonymize user identifiers in logs used for analytics or debugging (use a one-way HMAC). +- Separate **audit logs** (access to sensitive data, configuration changes, admin actions) from **application logs** (errors, performance). Different retention, different access controls. +- Implement **structured logging** (JSON) with a `userId` field that uses an internal identifier, not the email address. + +### Log fields — MUST NOT include + +- `password`, `passwordHash`, `secret`, `token`, `refreshToken`, `resetToken` +- `cardNumber`, `cvv`, `iban`, `bic` +- `ssn`, `nationalId`, `passportNumber` +- `dateOfBirth` (in logs where it is not strictly necessary) +- Full `email` in high-volume access logs (use a hash or user ID) + +--- + +## 11. Error Handling + +### MUST + +- Return **generic error messages** to clients — never expose internal state, stack traces, or database errors. + - ❌ `"Column 'email' violates unique constraint on table 'users'"` + - ✅ `"A user with this email address already exists."` +- Use **Problem Details (RFC 7807)** format for all error responses — structured, consistent, no internal leakage. +- Log the full error **server-side** with correlation ID. Return only the correlation ID to the client. + +### SHOULD + +- Implement a global exception handler/middleware that catches unhandled exceptions before they reach the response serializer. +- Differentiate between **operational errors** (user errors, 4xx) and **programmer errors** (bugs, 5xx) in your logging strategy. + +### MUST NOT + +- MUST NOT include file paths, class names, method names, or line numbers in error responses. +- MUST NOT include personal data in error messages (e.g., "User john@example.com not found"). + +--- + +## 12. Encryption + +### At-Rest Encryption + +| Sensitivity | Minimum standard | +|---|---| +| Standard personal data (name, address, email) | AES-256 disk/volume encryption (cloud provider default) | +| Sensitive personal data (health, biometric, financial) | AES-256 **column-level** encryption + envelope encryption via KMS | +| Encryption keys | HSM-backed KMS (Azure Key Vault Premium, AWS KMS with CMK) | + +### In-Transit Encryption + +- **MUST** enforce TLS 1.2 minimum; prefer TLS 1.3. +- **MUST** use HSTS (`Strict-Transport-Security: max-age=31536000; includeSubDomains; preload`). +- **MUST** pin certificates or use certificate transparency monitoring for critical services. +- **MUST NOT** allow TLS 1.0 or TLS 1.1. +- **MUST NOT** use null cipher suites or export-grade ciphers. + +### Key Management + +- **MUST** store encryption keys in a dedicated KMS — never hardcoded, never in environment variables in plain text. +- **MUST** rotate data encryption keys (DEKs) annually, or immediately upon suspected compromise. +- **SHOULD** use separate keys per environment (dev, staging, prod). +- **SHOULD** log all key access events in the KMS audit trail. + +--- + +## 13. Password Hashing + +### MUST + +- Use **bcrypt** (cost ≥ 12), **Argon2id** (recommended), or **scrypt** for password hashing. +- Never use MD5, SHA-1, SHA-256, or any non-password-specific hash function for passwords. +- Use a **unique salt per password** — never a global salt. +- Store only the hash — never the plaintext password, never a reversible encoding. + +### SHOULD + +- Implement **pepper** (a secret server-side value added before hashing) stored in the KMS, not in the database. +- Enforce a minimum password length of 12 characters. +- Check passwords against known breach lists (HaveIBeenPwned API) at registration and login. +- Re-hash on login if the stored hash uses an outdated algorithm — upgrade transparently. + +### MUST NOT + +- MUST NOT log passwords in any form — not during registration, not during failed login. +- MUST NOT transmit passwords in URLs or query strings. +- MUST NOT store password reset tokens in plaintext — hash them before storage. + +--- + +## 14. Secrets Management + +### MUST + +- Store all secrets in a dedicated secret manager: **Azure Key Vault**, **AWS Secrets Manager**, **GCP Secret Manager**, or **HashiCorp Vault**. +- MUST NOT commit secrets to source code repositories — use pre-commit hooks (`detect-secrets`, `gitleaks`) to prevent this. +- MUST NOT store secrets in environment variable defaults in code. +- MUST NOT pass secrets as plain-text command-line arguments (they appear in process lists). +- Rotate secrets immediately upon: + - Developer offboarding + - Suspected compromise + - Annual rotation schedule + +### SHOULD + +- Use short-lived credentials (OIDC-based GitHub Actions → cloud OIDC federation instead of long-lived API keys). +- Audit all secret access in the KMS — alert on anomalous access patterns. +- Maintain a **secrets inventory** document updated with every new secret. +- Use separate secret namespaces per environment. + +### In `.gitignore` — MUST include + +``` +.env +.env.* +*.pem +*.key +*.pfx +*.p12 +appsettings.Development.json # if it may contain connection strings +secrets/ +``` + +--- + +## 15. Anonymization & Pseudonymization + +### Definitions + +- **Anonymization**: Irreversible. The individual can no longer be identified. Anonymized data falls outside GDPR scope. +- **Pseudonymization**: Reversible with a key. The individual can be re-identified. Pseudonymized data is still personal data under GDPR, but carries reduced risk. + +### Anonymization Techniques + +| Technique | When to use | +|---|---| +| **Generalization** | Replace exact value with a range (age 34 → "30–40") | +| **Suppression** | Remove the field entirely | +| **Data masking** | Replace with a fixed placeholder (name → "ANONYMIZED_USER") | +| **Noise addition** | Add statistical noise to numerical values for analytics | +| **Aggregation** | Report group statistics, never individual values | +| **K-anonymity / l-diversity** | For analytics datasets — ensure each record is indistinguishable from k-1 others | + +### Pseudonymization Techniques + +- **HMAC-SHA256 with a secret key**: Consistent, one-way, keyed. Use for user identifiers in analytics. +- **Tokenization**: Replace value with an opaque token; mapping stored separately in a secure vault. +- **Encryption with a separate key**: Decrypt only with explicit authorization. + +### MUST + +- When a user exercises the right to erasure, anonymize all records that must be retained (e.g., financial records, audit logs) rather than deleting them — replace identifying fields with anonymized values. +- Store the pseudonymization key in the KMS — never in the database alongside pseudonymized data. +- Test anonymization routines with assertions that the original value cannot be recovered from the output. + +### MUST NOT + +- MUST NOT call data "anonymized" if re-identification is possible through linkage attacks with other datasets. +- MUST NOT apply pseudonymization and then store the mapping key in the same table as the pseudonymized data. + +--- + +## 16. Testing with Fake Data + +### MUST + +- MUST NOT use production personal data in development, staging, or test environments. +- MUST NOT restore production database backups to non-production environments without scrubbing personal data first. +- Use **synthetic data generators** for test fixtures: `Bogus` (.NET), `Faker` (JS/Python/Ruby), `factory_boy` (Python). + +### SHOULD + +- Build a **data anonymization pipeline** for production → staging refreshes: replace all PII fields with generated fakes before the restore completes. +- Add CI checks that fail if test fixtures contain real-looking email domains, real names, or real phone number patterns. +- Use realistic but fictional datasets (fake names, fake emails at `@example.com`, fake addresses) so UI tests are meaningful. + +### Test Data Rules + +``` +# MUST use for test emails +user@example.com +test.user+{n}@example.com + +# MUST NOT use in tests +Real customer emails +Real names from production +Real phone numbers +Real national ID numbers +``` + +--- + +## 17. Incident & Breach Handling + +### Regulatory Timeline + +- **72 hours**: Notify the supervisory authority (e.g., CNIL, APD, ICO) from the moment of awareness of a personal data breach — unless the breach is unlikely to result in a risk to individuals. +- **Without undue delay**: Notify affected data subjects if the breach is likely to result in a high risk to their rights and freedoms. + +### MUST + +- Maintain a **breach response runbook** with: + 1. Detection criteria (what triggers an incident) + 2. Severity classification (low / medium / high / critical) + 3. Containment steps per scenario (credential leak, DB dump exposed, ransomware) + 4. Evidence preservation steps + 5. DPA notification template + 6. Data subject notification template + 7. Post-incident review process +- Log all personal data breaches internally — even those that do not require DPA notification. +- Test the breach response process at least annually (tabletop exercise). + +### SHOULD + +- Implement automated alerts for: + - Unusual volume of data exports + - Access to sensitive tables outside business hours + - Bulk deletion events + - Failed authentication spikes + - New credentials appearing in public breach databases (`haveibeenpwned` monitoring) +- Store breach records (internal) for at least 5 years. + +### MUST NOT + +- MUST NOT delete evidence upon discovery of a breach — preserve logs, snapshots, and access records. +- MUST NOT notify the press or users before notifying the DPA, unless lives are at immediate risk. + +--- + +## 18. Cloud & DevOps Practices + +### MUST + +- Enable **encryption at rest** for all cloud storage: blob/object storage, managed databases, queues, caches. +- Use **private endpoints** for databases — they MUST NOT be publicly accessible. +- Apply **network security groups / firewall rules** to restrict database access to application layers only. +- Enable **cloud-native audit logging**: Azure Monitor / AWS CloudTrail / GCP Cloud Audit Logs. +- Store personal data only in **approved geographic regions** consistent with GDPR data residency requirements (EEA, or adequacy decision / SCCs for transfers outside EEA). +- Tag all cloud resources that process personal data with a `DataClassification` tag. + +### SHOULD + +- Enable **Microsoft Defender for Cloud / AWS Security Hub / GCP Security Command Center** and review recommendations regularly. +- Use **managed identities** (Azure) or **IAM roles** (AWS/GCP) instead of long-lived access keys for service-to-service authentication. +- Enable **soft delete and versioning** on object storage — accidental deletion should be recoverable within the retention window. +- Apply **DLP (Data Loss Prevention)** policies on cloud storage to detect PII being written to unprotected buckets. + +### MUST NOT + +- MUST NOT store personal data in public cloud storage buckets (S3, Azure Blob, GCS) without access controls. +- MUST NOT deploy databases with public IPs in production. +- MUST NOT use the same cloud account / subscription for production and non-production if production data could bleed across. + +--- + +## 19. CI/CD Controls + +### MUST + +- Run **secret scanning** on every commit: `gitleaks`, `detect-secrets`, GitHub secret scanning (native). +- Run **dependency vulnerability scanning** on every build: `npm audit`, `dotnet list package --vulnerable`, `trivy`, `snyk`. +- MUST NOT use real personal data in CI test jobs. +- MUST NOT log environment variables in CI pipelines — mask all secrets. + +### SHOULD + +- Add a **GDPR compliance gate** to the pipeline: + - No new columns without a documented retention period (enforced via migration linting). + - No new log statements containing fields flagged as PII. + - Dependency license check (avoid GPL/AGPL for closed-source SaaS). +- Run **SAST (Static Application Security Testing)**: `SonarQube`, `Semgrep`, `CodeQL`. +- Run **container image scanning**: `trivy`, `Snyk Container`, `AWS ECR scanning`. +- Rotate all CI secrets annually and upon personnel changes. + +### Pipeline Secret Rules + +```yaml +# MUST: mask secrets in logs +- name: Set secret + run: echo "::add-mask::${{ secrets.MY_SECRET }}" + +# MUST NOT: echo secrets to console +- name: Debug # ❌ Never do this + run: echo "API Key is $API_KEY" + +# SHOULD: use OIDC federation instead of long-lived keys +- name: Authenticate + uses: azure/login@v1 + with: + client-id: ${{ vars.AZURE_CLIENT_ID }} + tenant-id: ${{ vars.AZURE_TENANT_ID }} + subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }} +``` + +--- + +## 20. Architecture Patterns + +### Recommended Patterns + +**Data Store Separation** +Separate operational data (transactional DB) from analytical data (data warehouse). Apply different retention and access controls to each. + +**Event Sourcing with PII Scrubbing** +When using event sourcing, implement a **crypto-shredding** pattern: encrypt personal data in events with a per-user key. Deleting the key effectively anonymizes all events for that user. + +**Audit Log Segregation** +Store audit logs in a separate, append-only store with restricted write access. The application service account MUST NOT be able to delete audit log entries. + +**Consent Store** +Implement a dedicated consent service that tracks: +- What the user consented to +- When they consented +- Which version of the privacy policy they accepted +- The mechanism of consent (checkbox, API, paper) + +**Data Subject Request Queue** +Implement DSRs as an asynchronous workflow (queue + worker) to handle the complexity of scrubbing data across multiple stores reliably. + +**Pseudonymization Gateway** +For analytics pipelines, implement a pseudonymization service at the boundary between operational and analytical systems. The mapping key never leaves the operational zone. + +--- + +## 21. Anti-Patterns + +These are common mistakes that create GDPR liability. Avoid them. + +| Anti-pattern | Risk | Correct approach | +|---|---|---| +| Storing emails in URLs | Logged in CDN/server logs, browser history | Use opaque user IDs in URLs | +| Logging full request bodies | Captures passwords, health data, PII | Log only structured event metadata | +| "Keep forever" database design | Retention violations | Define TTL at schema design time | +| Using production data in dev | Data breach, no legal basis | Synthetic data generators + scrubbing pipeline | +| Shared credentials across teams | Cannot attribute access, cannot rotate safely | Individual accounts + RBAC | +| Hard-coded secrets | Compromise of all environments at once | KMS + secret manager | +| Sequential integer user IDs in URLs | IDOR vulnerabilities, enumeration | UUIDs or opaque identifiers | +| Global `*` CORS header on authenticated API | Cross-origin data theft | Explicit CORS allowlist | +| Storing consent in the same table as profile | Cannot prove consent without the profile | Separate consent store | +| Sending PII in GET query params | Server logs, referrer headers, browser history | POST body or authenticated session | +| Re-using analytics SDK across all users without consent | PECR/ePrivacy violation | Conditional loading behind consent gate | +| Mixing backup and live data residency regions | GDPR data residency violation | Explicit region lockdown on backup jobs | +| "Anonymized" data that includes quasi-identifiers | Re-identification risk | Apply k-anonymity, test linkage resistance | + +--- + +## 22. PR Review Checklist + +Use this checklist on every pull request that touches personal data, authentication, logging, or infrastructure. + +### Data Model Changes +- [ ] Every new column that holds personal data has a documented purpose. +- [ ] Every new table with personal data has a retention period defined (column or policy). +- [ ] Sensitive fields (health, financial, national ID) use column-level encryption. +- [ ] No sequential integer PKs used as public-facing identifiers. + +### API Changes +- [ ] No personal data in URL path or query parameters. +- [ ] All endpoints returning personal data are authenticated. +- [ ] Ownership checks are in place (user cannot access another user's resource). +- [ ] Response projections exclude fields the caller is not authorized to see. +- [ ] Rate limiting applied to sensitive endpoints. + +### Logging Changes +- [ ] No passwords, tokens, or credentials logged. +- [ ] No full email addresses or national IDs in high-volume logs. +- [ ] IPs are anonymized (last octet masked). +- [ ] No full request/response bodies logged where PII may be present. + +### Infrastructure Changes +- [ ] No storage buckets are public. +- [ ] Databases use private endpoints only. +- [ ] New cloud resources are tagged with `DataClassification`. +- [ ] Encryption at rest enabled for new storage resources. +- [ ] New geographic regions for data storage are approved and compliant with GDPR. + +### Secrets & Configuration +- [ ] No secrets in source code or committed config files. +- [ ] New secrets are added to KMS and to the secrets inventory document. +- [ ] CI/CD secrets are masked in pipeline logs. + +### Retention & Deletion +- [ ] New data flows have a retention enforcement job or policy. +- [ ] Erasure pipeline covers this new data store or field. +- [ ] Soft-delete is used where hard-delete would be premature. + +### User Rights +- [ ] If a new personal data field is introduced, the data export endpoint includes it. +- [ ] If a new data store is introduced, the erasure runbook is updated. + +### Third Parties & Sub-processors +- [ ] No new third-party service receives personal data without a signed DPA. +- [ ] New sub-processors are added to the RoPA. + +### General +- [ ] RoPA updated if a new processing activity is introduced. +- [ ] No production personal data used in tests. +- [ ] DPIA triggered if the change involves high-risk processing (profiling, health, biometrics, large scale). + +--- + +## Quick Reference — MUST / MUST NOT Summary + +| Topic | MUST | MUST NOT | +|---|---|---| +| **Passwords** | bcrypt/Argon2id, cost ≥ 12 | MD5, SHA-1, SHA-256, plaintext storage | +| **Secrets** | KMS / secret manager | Commit to Git, hardcode in source | +| **Encryption** | TLS 1.2+, AES-256 at rest | HTTP, TLS 1.0/1.1 | +| **URLs** | Opaque UUIDs | Emails, names, national IDs in paths | +| **Logs** | Anonymized IPs, event-based | Passwords, tokens, full bodies with PII | +| **Error responses** | Generic messages, correlation ID | Stack traces, DB errors, user data | +| **Test data** | Synthetic / Faker-generated | Real production PII | +| **Retention** | TTL defined at design time | "Keep forever" | +| **Erasure** | Cover all stores, test it | Partial deletion leaving PII in logs/cache | +| **Third parties** | Signed DPA before data flows | Onboard without DPA | +| **IDs** | UUIDs as public identifiers | Sequential integers in public URLs | +| **CORS** | Explicit allowlist | `Access-Control-Allow-Origin: *` on auth APIs | + +--- + +> **Golden Rule — repeated for emphasis:** +> **Collect less. Store less. Expose less. Retain less.** +> +> Every byte of personal data you do not collect is a byte you cannot lose, +> cannot breach, and cannot be held liable for. + +--- + +*Inspired by CNIL developer GDPR guidance, GDPR Articles 5, 25, 32, 33, 35, +and engineering best practices from ENISA, OWASP, and NIST.* From 75fd4d3684654e5be358bec41691184b83239a31 Mon Sep 17 00:00:00 2001 From: Mikael Krief Date: Mon, 30 Mar 2026 22:16:34 +0200 Subject: [PATCH 2/4] Add GDPR compliance references for Security and Data Rights - Introduced a comprehensive Security.md file detailing encryption, password hashing, secrets management, anonymization, cloud practices, CI/CD controls, and incident response protocols. - Created a Data Rights.md file outlining user rights implementation, Record of Processing Activities (RoPA), consent management, sub-processor management, and DPIA triggers. --- docs/README.skills.md | 2 +- skills/gdpr-compliant/SKILL.md | 765 ++++-------------- skills/gdpr-compliant/references/Security.md | 266 ++++++ .../gdpr-compliant/references/data-rights.md | 177 ++++ 4 files changed, 607 insertions(+), 603 deletions(-) create mode 100644 skills/gdpr-compliant/references/Security.md create mode 100644 skills/gdpr-compliant/references/data-rights.md diff --git a/docs/README.skills.md b/docs/README.skills.md index 829ce7924..c6cfc3229 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -133,7 +133,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [fluentui-blazor](../skills/fluentui-blazor/SKILL.md) | Guide for using the Microsoft Fluent UI Blazor component library (Microsoft.FluentUI.AspNetCore.Components NuGet package) in Blazor applications. Use this when the user is building a Blazor app with Fluent UI components, setting up the library, using FluentUI components like FluentButton, FluentDataGrid, FluentDialog, FluentToast, FluentNavMenu, FluentTextField, FluentSelect, FluentAutocomplete, FluentDesignTheme, or any component prefixed with "Fluent". Also use when troubleshooting missing providers, JS interop issues, or theming. | `references/DATAGRID.md`
`references/LAYOUT-AND-NAVIGATION.md`
`references/SETUP.md`
`references/THEMING.md` | | [folder-structure-blueprint-generator](../skills/folder-structure-blueprint-generator/SKILL.md) | Comprehensive technology-agnostic prompt for analyzing and documenting project folder structures. Auto-detects project types (.NET, Java, React, Angular, Python, Node.js, Flutter), generates detailed blueprints with visualization options, naming conventions, file placement patterns, and extension templates for maintaining consistent code organization across diverse technology stacks. | None | | [game-engine](../skills/game-engine/SKILL.md) | Expert skill for building web-based game engines and games using HTML5, Canvas, WebGL, and JavaScript. Use when asked to create games, build game engines, implement game physics, handle collision detection, set up game loops, manage sprites, add game controls, or work with 2D/3D rendering. Covers techniques for platformers, breakout-style games, maze games, tilemaps, audio, multiplayer via WebRTC, and publishing games. | `assets/2d-maze-game.md`
`assets/2d-platform-game.md`
`assets/gameBase-template-repo.md`
`assets/paddle-game-template.md`
`assets/simple-2d-engine.md`
`references/3d-web-games.md`
`references/algorithms.md`
`references/basics.md`
`references/game-control-mechanisms.md`
`references/game-engine-core-principles.md`
`references/game-publishing.md`
`references/techniques.md`
`references/terminology.md`
`references/web-apis.md` | -| [gdpr-compliant](../skills/gdpr-compliant/SKILL.md) | Apply GDPR-compliant engineering practices across your codebase. Use this skill whenever you are designing APIs, writing data models, building authentication flows, implementing logging, handling user data, writing retention/deletion jobs, designing cloud infrastructure, or reviewing pull requests for privacy compliance. Trigger this skill for any task involving personal data, user accounts, cookies, analytics, emails, audit logs, encryption, pseudonymization, anonymization, data exports, breach response, CI/CD pipelines that process real data, or any question framed as "is this GDPR-compliant?". Inspired by CNIL developer guidance and GDPR Articles 5, 25, 32, 33, 35. | None | +| [gdpr-compliant](../skills/gdpr-compliant/SKILL.md) | Apply GDPR-compliant engineering practices across your codebase. Use this skill whenever you are designing APIs, writing data models, building authentication flows, implementing logging, handling user data, writing retention/deletion jobs, designing cloud infrastructure, or reviewing pull requests for privacy compliance. Trigger this skill for any task involving personal data, user accounts, cookies, analytics, emails, audit logs, encryption, pseudonymization, anonymization, data exports, breach response, CI/CD pipelines that process real data, or any question framed as "is this GDPR-compliant?". Inspired by CNIL developer guidance and GDPR Articles 5, 25, 32, 33, 35. | `references/Security.md`
`references/data-rights.md` | | [gen-specs-as-issues](../skills/gen-specs-as-issues/SKILL.md) | This workflow guides you through a systematic approach to identify missing features, prioritize them, and create detailed specifications for implementation. | None | | [generate-custom-instructions-from-codebase](../skills/generate-custom-instructions-from-codebase/SKILL.md) | Migration and code evolution instructions generator for GitHub Copilot. Analyzes differences between two project versions (branches, commits, or releases) to create precise instructions allowing Copilot to maintain consistency during technology migrations, major refactoring, or framework version upgrades. | None | | [geofeed-tuner](../skills/geofeed-tuner/SKILL.md) | Use this skill whenever the user mentions IP geolocation feeds, RFC 8805, geofeeds, or wants help creating, tuning, validating, or publishing a self-published IP geolocation feed in CSV format. Intended user audience is a network operator, ISP, mobile carrier, cloud provider, hosting company, IXP, or satellite provider asking about IP geolocation accuracy, or geofeed authoring best practices. Helps create, refine, and improve CSV-format IP geolocation feeds with opinionated recommendations beyond RFC 8805 compliance. Do NOT use for private or internal IP address management — applies only to publicly routable IP addresses. | `assets/example`
`assets/iso3166-1.json`
`assets/iso3166-2.json`
`assets/small-territories.json`
`references/rfc8805.txt`
`references/snippets-python3.md`
`scripts/templates` | diff --git a/skills/gdpr-compliant/SKILL.md b/skills/gdpr-compliant/SKILL.md index ff6d9b7fd..f3c2823b4 100644 --- a/skills/gdpr-compliant/SKILL.md +++ b/skills/gdpr-compliant/SKILL.md @@ -13,713 +13,274 @@ description: 'Apply GDPR-compliant engineering practices across your codebase. U # GDPR Engineering Skill -A comprehensive, actionable reference for engineers, architects, DevOps engineers, -and tech leads building GDPR-compliant software in the EU/EEA or handling data of -EU residents. +Actionable GDPR reference for engineers, architects, DevOps, and tech leads. +Inspired by CNIL developer guidance and GDPR Articles 5, 25, 32, 33, 35. -> **Golden Rule — commit this to memory:** -> **Collect less. Store less. Expose less. Retain less.** +> **Golden Rule:** Collect less. Store less. Expose less. Retain less. ---- - -## Table of Contents - -1. [Core GDPR Principles](#1-core-gdpr-principles) -2. [Privacy by Design & by Default](#2-privacy-by-design--by-default) -3. [Data Minimization](#3-data-minimization) -4. [Purpose Limitation](#4-purpose-limitation) -5. [Storage Limitation & Retention Policies](#5-storage-limitation--retention-policies) -6. [Integrity & Confidentiality](#6-integrity--confidentiality) -7. [Accountability & Records of Processing](#7-accountability--records-of-processing) -8. [User Rights Implementation](#8-user-rights-implementation) -9. [API Design Rules](#9-api-design-rules) -10. [Logging Rules](#10-logging-rules) -11. [Error Handling](#11-error-handling) -12. [Encryption](#12-encryption) -13. [Password Hashing](#13-password-hashing) -14. [Secrets Management](#14-secrets-management) -15. [Anonymization & Pseudonymization](#15-anonymization--pseudonymization) -16. [Testing with Fake Data](#16-testing-with-fake-data) -17. [Incident & Breach Handling](#17-incident--breach-handling) -18. [Cloud & DevOps Practices](#18-cloud--devops-practices) -19. [CI/CD Controls](#19-cicd-controls) -20. [Architecture Patterns](#20-architecture-patterns) -21. [Anti-Patterns](#21-anti-patterns) -22. [PR Review Checklist](#22-pr-review-checklist) +For deep dives, read the reference files in `references/`: +- `references/data-rights.md` — user rights endpoints, DSR workflow, RoPA +- `references/security.md` — encryption, hashing, secrets, anonymization +- `references/operations.md` — cloud, CI/CD, incident response, architecture patterns --- -## 1. Core GDPR Principles - -These seven principles (Article 5 GDPR) are the foundation of every engineering decision. +## 1. Core GDPR Principles (Article 5) -| Principle | Engineering meaning | +| Principle | Engineering obligation | |---|---| -| **Lawfulness, fairness, transparency** | Have a documented legal basis for every processing activity. Expose privacy notices in the UI. | -| **Purpose limitation** | Data collected for purpose A MUST NOT be silently reused for purpose B without a new legal basis. | -| **Data minimization** | Collect only the fields you actually need today. Delete the rest. | -| **Accuracy** | Provide update endpoints. Propagate corrections to downstream stores. | -| **Storage limitation** | Define a TTL at the moment you design the schema, not after. | -| **Integrity & confidentiality** | Encrypt at rest and in transit. Restrict access. Audit access to sensitive data. | -| **Accountability** | Maintain documented evidence that you comply. DPA-ready at any time. | +| Lawfulness, fairness, transparency | Document legal basis for every processing activity in the RoPA | +| Purpose limitation | Data collected for purpose A **MUST NOT** be reused for purpose B without a new legal basis | +| Data minimization | Collect only fields with a documented business need today | +| Accuracy | Provide update endpoints; propagate corrections to downstream stores | +| Storage limitation | Define TTL at schema design time — never after | +| Integrity & confidentiality | Encrypt at rest and in transit; restrict and audit access | +| Accountability | Maintain evidence of compliance; RoPA ready for DPA inspection at any time | --- ## 2. Privacy by Design & by Default -**Privacy by Design** means privacy is an architectural requirement, not a retrofit. -**Privacy by Default** means the most privacy-preserving option is always the default. - -### MUST - -- Design data models with retention in mind from day one — add `CreatedAt`, `DeletedAt`, `RetentionExpiresAt` columns when the entity is first created. -- Default all optional data collection to **off**. Users opt in; they do not opt out. -- Make the least-privileged access path the default API behavior. -- Conduct a **Data Protection Impact Assessment (DPIA)** before building any high-risk processing (biometrics, large-scale profiling, health data, systematic monitoring). -- Document processing activities in a **Record of Processing Activities (RoPA)** — update it with every new feature. +**MUST** +- Add `CreatedAt`, `RetentionExpiresAt` to every table holding personal data at creation time. +- Default all optional data collection to **off**. Users opt in; they never opt out of a default-on setting. +- Conduct a **DPIA** before building high-risk processing (biometrics, health data, large-scale profiling, systematic monitoring). +- Update the **RoPA** with every new feature that introduces a processing activity. +- Sign a **DPA** with every sub-processor before data flows to them. -### SHOULD - -- Use feature flags to allow disabling data collection without a deployment. -- Apply column-level encryption for sensitive fields (health, financial, SSN, biometrics) rather than relying on disk encryption alone. -- Design for soft-delete + scheduled hard-delete, not immediate hard-delete, to allow data subject request windows. - -### MUST NOT - -- MUST NOT ship a new data collection feature without a documented legal basis. -- MUST NOT enable analytics, tracking, or telemetry by default without explicit consent. -- MUST NOT store personal data in a system not listed in the RoPA. +**MUST NOT** +- Ship a new data collection feature without a documented legal basis. +- Enable analytics, tracking, or telemetry by default without explicit consent. +- Store personal data in a system not listed in the RoPA. --- ## 3. Data Minimization -### MUST - -- Map every field in every DTO/model to a concrete business need. Remove fields with no documented use. -- In API responses, return only what the client actually needs. Never return full entity objects when a projection suffices. -- Truncate or mask data at the edge — e.g., return `****1234` for card numbers, not the full PAN. -- In search/list endpoints, exclude sensitive fields (date of birth, national ID, health data) from default projections. +**MUST** +- Map every DTO/model field to a concrete business need. Remove undocumented fields. +- Use **separate DTOs** for create, read, and update — never reuse the same object. +- Return only what the caller is authorized to see — use response projections. +- Mask sensitive values at the edge: return `****1234` for card numbers, never the full value. +- Exclude sensitive fields (DOB, national ID, health) from default list/search projections. -### SHOULD - -- Use separate DTOs for create, read, and update operations — never reuse the same object and accidentally expose fields. -- Add automated tests that assert sensitive fields are absent from API responses where they should not appear. - -### MUST NOT - -- MUST NOT log full request/response bodies if they may contain personal data. -- MUST NOT include personal data in URL path segments or query parameters (they end up in access logs, CDN logs, and browser history). -- MUST NOT collect `dateOfBirth`, national ID, or health data unless there is an explicit, documented business requirement and a legal basis. +**MUST NOT** +- Log full request/response bodies if they may contain personal data. +- Include personal data in URL path segments or query parameters (CDN logs, browser history). +- Collect `dateOfBirth`, national ID, or health data without an explicit legal basis. --- ## 4. Purpose Limitation -### MUST - +**MUST** - Document the purpose of every processing activity in code comments and in the RoPA. -- Tag database columns with their purpose in migration scripts or schema documentation. -- When reusing data for a secondary purpose (e.g., fraud detection reusing transactional data), obtain a new legal basis or confirm compatibility analysis. +- Obtain a new legal basis or perform a compatibility analysis before reusing data for a secondary purpose. -### SHOULD - -- Implement **data purpose tags** as metadata in your data warehouse/lake so downstream pipelines cannot silently extend usage. -- Build separate data stores for separate purposes (e.g., marketing analytics must not read from production operational data directly). - -### MUST NOT - -- MUST NOT share personal data collected for service delivery with third-party advertising networks without explicit consent. -- MUST NOT use support ticket content to train ML models without a separate legal basis and user notice. +**MUST NOT** +- Share personal data collected for service delivery with advertising networks without explicit consent. +- Use support ticket content to train ML models without a separate legal basis and user notice. --- -## 5. Storage Limitation & Retention Policies - -### MUST +## 5. Storage Limitation & Retention -- Every table or store that holds personal data MUST have a defined retention period. -- Implement a scheduled job (e.g., Hangfire, cron) that enforces retention — not a manual process. -- Distinguish between **anonymization** (data may remain) and **deletion** (data is gone). Choose deliberately. -- Archive or anonymize data when retention expires — never leave expired data silently in production. -- Document retention periods in a **Retention Policy** document linked from the RoPA. +**MUST** +- Every table holding personal data **MUST** have a defined retention period. +- Enforce retention automatically via a scheduled job (Hangfire, cron) — never a manual process. +- Anonymize or delete data when retention expires — never leave expired data silently in production. -### Recommended Retention Defaults +**Recommended defaults** -| Data type | Suggested maximum retention | +| Data type | Max retention | |---|---| -| Authentication logs | 12 months | -| Audit logs | 12–24 months (legal requirements may extend this) | -| Session tokens / refresh tokens | 30–90 days | +| Auth / audit logs | 12–24 months | +| Session / refresh tokens | 30–90 days | | Email / notification logs | 6 months | -| User accounts (inactive) | 12 months after last login, then notify + delete | -| Payment records | As required by tax law (typically 7–10 years), but minimized | -| Support tickets | 3 years after closure | -| Analytics events | 13 months (standard GA-style) | +| Inactive user accounts | 12 months after last login → notify → delete | +| Payment records | As required by tax law (7–10 years), minimized | +| Analytics events | 13 months | -### SHOULD +**SHOULD** +- Add `RetentionExpiresAt` column — compute at insert time. +- Use soft-delete (`DeletedAt`) with a scheduled hard-delete after the erasure request window (30 days). -- Add a `RetentionExpiresAt` column to every sensitive table — compute it at insert time. -- Use soft-delete (`DeletedAt`) with a scheduled hard-delete job after the GDPR erasure request window (30 days). - -### MUST NOT - -- MUST NOT retain personal data indefinitely "in case it becomes useful later." -- MUST NOT use production data as a long-term data lake without a retention enforcement mechanism. +**MUST NOT** +- Retain personal data indefinitely "in case it becomes useful later." --- -## 6. Integrity & Confidentiality - -### MUST - -- Enforce **TLS 1.2+** on all connections. Reject older protocols. -- Encrypt personal data **at rest** using AES-256 or equivalent. -- Use **column-level encryption** for highly sensitive fields (health, biometric, financial, national ID). -- Restrict database access by role — application user MUST NOT have DDL rights on the production database. -- Enforce the **principle of least privilege** on all IAM roles, service accounts, and API keys. -- Enable access logging on databases and object storage. Retain access logs per retention policy. - -### SHOULD - -- Use **envelope encryption**: data encrypted with a data encryption key (DEK) which is itself encrypted by a key encryption key (KEK) stored in a KMS (Azure Key Vault, AWS KMS, GCP Cloud KMS). -- Enable **automatic key rotation** (annually minimum). -- Use **network segmentation**: databases must not be publicly accessible. Use private endpoints / VPC peering. -- Enable **audit logging** at the database level for SELECT on sensitive tables. - -### MUST NOT - -- MUST NOT store secrets (API keys, connection strings, passwords) in source code, configuration files committed to Git, or environment variable defaults. -- MUST NOT use self-signed certificates in production. -- MUST NOT transmit personal data over HTTP. - ---- - -## 7. Accountability & Records of Processing - -### MUST - -- Maintain a **Record of Processing Activities (RoPA)** — a living document updated with every new feature. Minimum fields per activity: - - Name and purpose - - Legal basis (contract / legitimate interest / consent / legal obligation / vital interest / public task) - - Categories of data subjects - - Categories of personal data - - Recipients (third parties, sub-processors) - - Transfers outside EEA and safeguards - - Retention period - - Security measures - -- Maintain a list of all **sub-processors** (cloud providers, SaaS tools, analytics, email providers). Review annually. -- Sign **Data Processing Agreements (DPAs)** with every sub-processor before data flows to them. - -### SHOULD - -- Generate a machine-readable RoPA (YAML/JSON) alongside the human-readable version, so it can be version-controlled. -- Automate a quarterly reminder to review the RoPA and sub-processor list. - -### MUST NOT +## 6. API Design Rules -- MUST NOT onboard a new SaaS tool that processes personal data without a signed DPA and RoPA entry. - ---- - -## 8. User Rights Implementation - -GDPR grants data subjects the following rights. Each must have a technical implementation path. - -| Right | Engineering implementation | -|---|---| -| **Right of access (Art. 15)** | `GET /api/v1/me/data-export` — returns all personal data in a machine-readable format (JSON or CSV). Respond within 30 days. | -| **Right to rectification (Art. 16)** | `PUT /api/v1/me/profile` — allow users to update their data. Propagate changes to downstream stores (search index, data warehouse). | -| **Right to erasure / right to be forgotten (Art. 17)** | `DELETE /api/v1/me` — anonymize or delete all personal data. Implement a checklist of all stores to scrub. | -| **Right to restriction of processing (Art. 18)** | Add a `ProcessingRestricted` flag on the user record. Gate all non-essential processing behind this flag. | -| **Right to data portability (Art. 20)** | Same as access endpoint, but ensure the format is structured, commonly used, and machine-readable (JSON preferred). | -| **Right to object (Art. 21)** | Provide an opt-out mechanism for processing based on legitimate interest. Honor it immediately — do not defer. | -| **Rights related to automated decision-making (Art. 22)** | If automated decisions produce legal or significant effects, provide a human review path. Expose an explanation of the logic. | - -### MUST - -- Every right MUST have a tested API endpoint (or admin back-office process) before the system goes live. -- Erasure MUST be comprehensive — document every store where a user's data lives (DB, S3, search index, cache, email logs, CDN logs, analytics). -- Respond to verified data subject requests within **30 calendar days**. -- Provide a machine-readable data export — not a PDF screenshot. - -### SHOULD - -- Build a **Data Subject Request (DSR) tracker** — a back-office tool to manage incoming requests, deadlines, and completion status. -- Automate the erasure pipeline for primary stores; document the manual steps for third-party stores. -- Test erasure with integration tests that assert the user's data is absent from all stores after deletion. - -### MUST NOT - -- MUST NOT require users to contact support via phone or letter to exercise their rights if the product is digital. -- MUST NOT charge a fee for data access requests unless clearly abusive and excessive. - ---- - -## 9. API Design Rules - -### MUST - -- MUST NOT include personal data in URL path or query parameters. - - ❌ `GET /users/john.doe@example.com` - - ✅ `GET /users/{userId}` +**MUST** +- MUST NOT include personal data in URL paths or query parameters. + - ❌ `GET /users/john.doe@example.com` ✅ `GET /users/{userId}` - Authenticate all endpoints that return or accept personal data. -- Enforce **RBAC or ABAC** — users MUST NOT be able to access another user's data by guessing IDs (IDOR prevention). - - Always extract the acting user's identity from the JWT/session, never from the request body. - - Validate ownership: `if (resource.OwnerId != currentUserId) return 403`. -- Version your API — breaking privacy changes require a new version with migration guidance. -- Return **only the fields the caller is authorized to see**. Use response projections. +- Extract the acting user's identity from the JWT — never from the request body. +- Validate ownership on every resource: `if (resource.OwnerId != currentUserId) return 403`. +- Use UUIDs or opaque identifiers — never sequential integers as public resource IDs. -### SHOULD +**SHOULD** +- Rate-limit sensitive endpoints (login, data export, password reset). +- Set `Referrer-Policy: no-referrer` and an explicit `CORS` allowlist. -- Implement **rate limiting** on sensitive endpoints (login, data export, password reset) to prevent enumeration and abuse. -- Add a `Content-Security-Policy` header on all responses. -- Use `Referrer-Policy: no-referrer` or `strict-origin` to prevent personal data leaking in referer headers. -- Implement **CORS** with an explicit allowlist. Never use `Access-Control-Allow-Origin: *` on authenticated APIs. - -### MUST NOT - -- MUST NOT return stack traces, internal paths, or database error messages in API error responses. -- MUST NOT use predictable sequential integer IDs as public resource identifiers — use UUIDs or opaque identifiers. -- MUST NOT expose bulk export endpoints without authentication and rate limiting. +**MUST NOT** +- Return stack traces, internal paths, or database errors in API responses. +- Use `Access-Control-Allow-Origin: *` on authenticated APIs. --- -## 10. Logging Rules - -### MUST - -- **Anonymize IPs** in application logs — mask the last octet (IPv4) or the last 80 bits (IPv6). - - ❌ `192.168.1.42` - - ✅ `192.168.1.xxx` -- MUST NOT log passwords, tokens, session IDs, or authentication credentials. -- MUST NOT log full request or response bodies if they may contain personal data (forms, profile updates, health data). -- MUST NOT log national identification numbers, payment card numbers, or health data. -- Apply log retention — purge logs automatically after the defined retention period. +## 7. Logging Rules -### SHOULD +**MUST** +- Anonymize IPs in application logs — mask last octet (IPv4) or last 80 bits (IPv6). + - ❌ `192.168.1.42` ✅ `192.168.1.xxx` +- MUST NOT log: passwords, tokens, session IDs, credentials, card numbers, national IDs, health data. +- MUST NOT log full request/response bodies where PII may be present. +- Enforce log retention — purge automatically after the defined period. -- Log **events** rather than data: `"User {UserId} updated email"` not `"Email changed from a@b.com to c@d.com"`. -- Hash or pseudonymize user identifiers in logs used for analytics or debugging (use a one-way HMAC). -- Separate **audit logs** (access to sensitive data, configuration changes, admin actions) from **application logs** (errors, performance). Different retention, different access controls. -- Implement **structured logging** (JSON) with a `userId` field that uses an internal identifier, not the email address. - -### Log fields — MUST NOT include - -- `password`, `passwordHash`, `secret`, `token`, `refreshToken`, `resetToken` -- `cardNumber`, `cvv`, `iban`, `bic` -- `ssn`, `nationalId`, `passportNumber` -- `dateOfBirth` (in logs where it is not strictly necessary) -- Full `email` in high-volume access logs (use a hash or user ID) +**SHOULD** +- Log **events** not data: `"User {UserId} updated email"` not `"Email changed from a@b.com to c@d.com"`. +- Use structured logging (JSON) with `userId` as an internal identifier, not the email address. +- Separate audit logs (sensitive access, admin actions) from application logs — different retention and ACLs. --- -## 11. Error Handling +## 8. Error Handling -### MUST - -- Return **generic error messages** to clients — never expose internal state, stack traces, or database errors. +**MUST** +- Return generic error messages — never expose stack traces, internal paths, or DB errors. - ❌ `"Column 'email' violates unique constraint on table 'users'"` - ✅ `"A user with this email address already exists."` -- Use **Problem Details (RFC 7807)** format for all error responses — structured, consistent, no internal leakage. -- Log the full error **server-side** with correlation ID. Return only the correlation ID to the client. - -### SHOULD - -- Implement a global exception handler/middleware that catches unhandled exceptions before they reach the response serializer. -- Differentiate between **operational errors** (user errors, 4xx) and **programmer errors** (bugs, 5xx) in your logging strategy. - -### MUST NOT - -- MUST NOT include file paths, class names, method names, or line numbers in error responses. -- MUST NOT include personal data in error messages (e.g., "User john@example.com not found"). - ---- - -## 12. Encryption - -### At-Rest Encryption - -| Sensitivity | Minimum standard | -|---|---| -| Standard personal data (name, address, email) | AES-256 disk/volume encryption (cloud provider default) | -| Sensitive personal data (health, biometric, financial) | AES-256 **column-level** encryption + envelope encryption via KMS | -| Encryption keys | HSM-backed KMS (Azure Key Vault Premium, AWS KMS with CMK) | - -### In-Transit Encryption - -- **MUST** enforce TLS 1.2 minimum; prefer TLS 1.3. -- **MUST** use HSTS (`Strict-Transport-Security: max-age=31536000; includeSubDomains; preload`). -- **MUST** pin certificates or use certificate transparency monitoring for critical services. -- **MUST NOT** allow TLS 1.0 or TLS 1.1. -- **MUST NOT** use null cipher suites or export-grade ciphers. - -### Key Management - -- **MUST** store encryption keys in a dedicated KMS — never hardcoded, never in environment variables in plain text. -- **MUST** rotate data encryption keys (DEKs) annually, or immediately upon suspected compromise. -- **SHOULD** use separate keys per environment (dev, staging, prod). -- **SHOULD** log all key access events in the KMS audit trail. - ---- - -## 13. Password Hashing - -### MUST - -- Use **bcrypt** (cost ≥ 12), **Argon2id** (recommended), or **scrypt** for password hashing. -- Never use MD5, SHA-1, SHA-256, or any non-password-specific hash function for passwords. -- Use a **unique salt per password** — never a global salt. -- Store only the hash — never the plaintext password, never a reversible encoding. - -### SHOULD - -- Implement **pepper** (a secret server-side value added before hashing) stored in the KMS, not in the database. -- Enforce a minimum password length of 12 characters. -- Check passwords against known breach lists (HaveIBeenPwned API) at registration and login. -- Re-hash on login if the stored hash uses an outdated algorithm — upgrade transparently. - -### MUST NOT - -- MUST NOT log passwords in any form — not during registration, not during failed login. -- MUST NOT transmit passwords in URLs or query strings. -- MUST NOT store password reset tokens in plaintext — hash them before storage. - ---- - -## 14. Secrets Management +- Use **Problem Details (RFC 7807)** for all error responses. +- Log the full error server-side with a correlation ID; return only the correlation ID to the client. -### MUST - -- Store all secrets in a dedicated secret manager: **Azure Key Vault**, **AWS Secrets Manager**, **GCP Secret Manager**, or **HashiCorp Vault**. -- MUST NOT commit secrets to source code repositories — use pre-commit hooks (`detect-secrets`, `gitleaks`) to prevent this. -- MUST NOT store secrets in environment variable defaults in code. -- MUST NOT pass secrets as plain-text command-line arguments (they appear in process lists). -- Rotate secrets immediately upon: - - Developer offboarding - - Suspected compromise - - Annual rotation schedule - -### SHOULD - -- Use short-lived credentials (OIDC-based GitHub Actions → cloud OIDC federation instead of long-lived API keys). -- Audit all secret access in the KMS — alert on anomalous access patterns. -- Maintain a **secrets inventory** document updated with every new secret. -- Use separate secret namespaces per environment. - -### In `.gitignore` — MUST include - -``` -.env -.env.* -*.pem -*.key -*.pfx -*.p12 -appsettings.Development.json # if it may contain connection strings -secrets/ -``` +**MUST NOT** +- Include file paths, class names, or line numbers in error responses. +- Include personal data in error messages (e.g., "User john@example.com not found"). --- -## 15. Anonymization & Pseudonymization +## 9. Encryption (summary — see `references/security.md` for full detail) -### Definitions - -- **Anonymization**: Irreversible. The individual can no longer be identified. Anonymized data falls outside GDPR scope. -- **Pseudonymization**: Reversible with a key. The individual can be re-identified. Pseudonymized data is still personal data under GDPR, but carries reduced risk. - -### Anonymization Techniques - -| Technique | When to use | +| Scope | Minimum standard | |---|---| -| **Generalization** | Replace exact value with a range (age 34 → "30–40") | -| **Suppression** | Remove the field entirely | -| **Data masking** | Replace with a fixed placeholder (name → "ANONYMIZED_USER") | -| **Noise addition** | Add statistical noise to numerical values for analytics | -| **Aggregation** | Report group statistics, never individual values | -| **K-anonymity / l-diversity** | For analytics datasets — ensure each record is indistinguishable from k-1 others | - -### Pseudonymization Techniques +| Standard personal data | AES-256 disk/volume encryption | +| Sensitive data (health, financial, biometric) | AES-256 **column-level** + envelope encryption via KMS | +| In transit | TLS 1.2+ (prefer 1.3); HSTS enforced | +| Keys | HSM-backed KMS; rotate DEKs annually | -- **HMAC-SHA256 with a secret key**: Consistent, one-way, keyed. Use for user identifiers in analytics. -- **Tokenization**: Replace value with an opaque token; mapping stored separately in a secure vault. -- **Encryption with a separate key**: Decrypt only with explicit authorization. - -### MUST - -- When a user exercises the right to erasure, anonymize all records that must be retained (e.g., financial records, audit logs) rather than deleting them — replace identifying fields with anonymized values. -- Store the pseudonymization key in the KMS — never in the database alongside pseudonymized data. -- Test anonymization routines with assertions that the original value cannot be recovered from the output. - -### MUST NOT - -- MUST NOT call data "anonymized" if re-identification is possible through linkage attacks with other datasets. -- MUST NOT apply pseudonymization and then store the mapping key in the same table as the pseudonymized data. +**MUST NOT** allow TLS 1.0/1.1, null cipher suites, or hardcoded encryption keys. --- -## 16. Testing with Fake Data - -### MUST - -- MUST NOT use production personal data in development, staging, or test environments. -- MUST NOT restore production database backups to non-production environments without scrubbing personal data first. -- Use **synthetic data generators** for test fixtures: `Bogus` (.NET), `Faker` (JS/Python/Ruby), `factory_boy` (Python). - -### SHOULD - -- Build a **data anonymization pipeline** for production → staging refreshes: replace all PII fields with generated fakes before the restore completes. -- Add CI checks that fail if test fixtures contain real-looking email domains, real names, or real phone number patterns. -- Use realistic but fictional datasets (fake names, fake emails at `@example.com`, fake addresses) so UI tests are meaningful. +## 10. Password Hashing -### Test Data Rules +**MUST** +- Use **Argon2id** (recommended) or **bcrypt** (cost ≥ 12). Never MD5, SHA-1, or SHA-256. +- Use a unique salt per password. Store only the hash. -``` -# MUST use for test emails -user@example.com -test.user+{n}@example.com - -# MUST NOT use in tests -Real customer emails -Real names from production -Real phone numbers -Real national ID numbers -``` +**MUST NOT** +- Log passwords in any form. Transmit passwords in URLs. Store reset tokens in plaintext. --- -## 17. Incident & Breach Handling - -### Regulatory Timeline - -- **72 hours**: Notify the supervisory authority (e.g., CNIL, APD, ICO) from the moment of awareness of a personal data breach — unless the breach is unlikely to result in a risk to individuals. -- **Without undue delay**: Notify affected data subjects if the breach is likely to result in a high risk to their rights and freedoms. - -### MUST - -- Maintain a **breach response runbook** with: - 1. Detection criteria (what triggers an incident) - 2. Severity classification (low / medium / high / critical) - 3. Containment steps per scenario (credential leak, DB dump exposed, ransomware) - 4. Evidence preservation steps - 5. DPA notification template - 6. Data subject notification template - 7. Post-incident review process -- Log all personal data breaches internally — even those that do not require DPA notification. -- Test the breach response process at least annually (tabletop exercise). +## 11. Secrets Management -### SHOULD +**MUST** +- Store all secrets in a KMS: Azure Key Vault, AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault. +- Use pre-commit hooks (`gitleaks`, `detect-secrets`) to prevent secret commits. +- Rotate secrets on developer offboarding, annual schedule, or suspected compromise. -- Implement automated alerts for: - - Unusual volume of data exports - - Access to sensitive tables outside business hours - - Bulk deletion events - - Failed authentication spikes - - New credentials appearing in public breach databases (`haveibeenpwned` monitoring) -- Store breach records (internal) for at least 5 years. +**`.gitignore` MUST include:** `.env`, `.env.*`, `*.pem`, `*.key`, `*.pfx`, `*.p12`, `secrets/` -### MUST NOT - -- MUST NOT delete evidence upon discovery of a breach — preserve logs, snapshots, and access records. -- MUST NOT notify the press or users before notifying the DPA, unless lives are at immediate risk. +**MUST NOT** +- Commit secrets to source code. Store secrets as plain-text environment variable defaults. --- -## 18. Cloud & DevOps Practices - -### MUST - -- Enable **encryption at rest** for all cloud storage: blob/object storage, managed databases, queues, caches. -- Use **private endpoints** for databases — they MUST NOT be publicly accessible. -- Apply **network security groups / firewall rules** to restrict database access to application layers only. -- Enable **cloud-native audit logging**: Azure Monitor / AWS CloudTrail / GCP Cloud Audit Logs. -- Store personal data only in **approved geographic regions** consistent with GDPR data residency requirements (EEA, or adequacy decision / SCCs for transfers outside EEA). -- Tag all cloud resources that process personal data with a `DataClassification` tag. - -### SHOULD +## 12. Anonymization & Pseudonymization (summary — see `references/security.md`) -- Enable **Microsoft Defender for Cloud / AWS Security Hub / GCP Security Command Center** and review recommendations regularly. -- Use **managed identities** (Azure) or **IAM roles** (AWS/GCP) instead of long-lived access keys for service-to-service authentication. -- Enable **soft delete and versioning** on object storage — accidental deletion should be recoverable within the retention window. -- Apply **DLP (Data Loss Prevention)** policies on cloud storage to detect PII being written to unprotected buckets. +- **Anonymization** = irreversible → falls outside GDPR scope. Use for retained records after erasure. +- **Pseudonymization** = reversible with a key → still personal data, reduced risk. +- When erasing a user, anonymize records that must be retained (financial, audit) rather than deleting them. +- Store the pseudonymization key in the KMS — never in the same database as the pseudonymized data. -### MUST NOT - -- MUST NOT store personal data in public cloud storage buckets (S3, Azure Blob, GCS) without access controls. -- MUST NOT deploy databases with public IPs in production. -- MUST NOT use the same cloud account / subscription for production and non-production if production data could bleed across. +**MUST NOT** call data "anonymized" if re-identification is possible through linkage attacks. --- -## 19. CI/CD Controls - -### MUST - -- Run **secret scanning** on every commit: `gitleaks`, `detect-secrets`, GitHub secret scanning (native). -- Run **dependency vulnerability scanning** on every build: `npm audit`, `dotnet list package --vulnerable`, `trivy`, `snyk`. -- MUST NOT use real personal data in CI test jobs. -- MUST NOT log environment variables in CI pipelines — mask all secrets. - -### SHOULD - -- Add a **GDPR compliance gate** to the pipeline: - - No new columns without a documented retention period (enforced via migration linting). - - No new log statements containing fields flagged as PII. - - Dependency license check (avoid GPL/AGPL for closed-source SaaS). -- Run **SAST (Static Application Security Testing)**: `SonarQube`, `Semgrep`, `CodeQL`. -- Run **container image scanning**: `trivy`, `Snyk Container`, `AWS ECR scanning`. -- Rotate all CI secrets annually and upon personnel changes. - -### Pipeline Secret Rules - -```yaml -# MUST: mask secrets in logs -- name: Set secret - run: echo "::add-mask::${{ secrets.MY_SECRET }}" - -# MUST NOT: echo secrets to console -- name: Debug # ❌ Never do this - run: echo "API Key is $API_KEY" +## 13. Testing with Fake Data -# SHOULD: use OIDC federation instead of long-lived keys -- name: Authenticate - uses: azure/login@v1 - with: - client-id: ${{ vars.AZURE_CLIENT_ID }} - tenant-id: ${{ vars.AZURE_TENANT_ID }} - subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }} -``` +**MUST** +- MUST NOT use production personal data in dev, staging, or CI environments. +- MUST NOT restore production DB backups to non-production without scrubbing PII first. +- Use synthetic data generators: `Bogus` (.NET), `Faker` (JS/Python/Ruby). +- Use `@example.com` for all test email addresses. --- -## 20. Architecture Patterns +## 14. Anti-Patterns -### Recommended Patterns - -**Data Store Separation** -Separate operational data (transactional DB) from analytical data (data warehouse). Apply different retention and access controls to each. - -**Event Sourcing with PII Scrubbing** -When using event sourcing, implement a **crypto-shredding** pattern: encrypt personal data in events with a per-user key. Deleting the key effectively anonymizes all events for that user. - -**Audit Log Segregation** -Store audit logs in a separate, append-only store with restricted write access. The application service account MUST NOT be able to delete audit log entries. - -**Consent Store** -Implement a dedicated consent service that tracks: -- What the user consented to -- When they consented -- Which version of the privacy policy they accepted -- The mechanism of consent (checkbox, API, paper) - -**Data Subject Request Queue** -Implement DSRs as an asynchronous workflow (queue + worker) to handle the complexity of scrubbing data across multiple stores reliably. - -**Pseudonymization Gateway** -For analytics pipelines, implement a pseudonymization service at the boundary between operational and analytical systems. The mapping key never leaves the operational zone. - ---- - -## 21. Anti-Patterns - -These are common mistakes that create GDPR liability. Avoid them. - -| Anti-pattern | Risk | Correct approach | -|---|---|---| -| Storing emails in URLs | Logged in CDN/server logs, browser history | Use opaque user IDs in URLs | -| Logging full request bodies | Captures passwords, health data, PII | Log only structured event metadata | -| "Keep forever" database design | Retention violations | Define TTL at schema design time | -| Using production data in dev | Data breach, no legal basis | Synthetic data generators + scrubbing pipeline | -| Shared credentials across teams | Cannot attribute access, cannot rotate safely | Individual accounts + RBAC | -| Hard-coded secrets | Compromise of all environments at once | KMS + secret manager | -| Sequential integer user IDs in URLs | IDOR vulnerabilities, enumeration | UUIDs or opaque identifiers | -| Global `*` CORS header on authenticated API | Cross-origin data theft | Explicit CORS allowlist | -| Storing consent in the same table as profile | Cannot prove consent without the profile | Separate consent store | -| Sending PII in GET query params | Server logs, referrer headers, browser history | POST body or authenticated session | -| Re-using analytics SDK across all users without consent | PECR/ePrivacy violation | Conditional loading behind consent gate | -| Mixing backup and live data residency regions | GDPR data residency violation | Explicit region lockdown on backup jobs | -| "Anonymized" data that includes quasi-identifiers | Re-identification risk | Apply k-anonymity, test linkage resistance | +| Anti-pattern | Correct approach | +|---|---| +| PII in URLs | Opaque UUIDs as public identifiers | +| Logging full request bodies | Log structured event metadata only | +| "Keep forever" schema | TTL defined at design time | +| Production data in dev/test | Synthetic data + scrubbing pipeline | +| Shared credentials across teams | Individual accounts + RBAC | +| Hardcoded secrets | KMS + secret manager | +| `Access-Control-Allow-Origin: *` on auth APIs | Explicit CORS allowlist | +| Storing consent with profile data | Dedicated consent store | +| PII in GET query params | POST body or authenticated session | +| Sequential integer IDs in public URLs | UUIDs | +| "Anonymized" data with quasi-identifiers | Apply k-anonymity, test linkage resistance | +| Mixing backup regions outside EEA | Explicit region lockdown on backup jobs | --- -## 22. PR Review Checklist +## 15. PR Review Checklist -Use this checklist on every pull request that touches personal data, authentication, logging, or infrastructure. - -### Data Model Changes -- [ ] Every new column that holds personal data has a documented purpose. -- [ ] Every new table with personal data has a retention period defined (column or policy). +### Data model +- [ ] Every new PII column has a documented purpose and retention period. - [ ] Sensitive fields (health, financial, national ID) use column-level encryption. -- [ ] No sequential integer PKs used as public-facing identifiers. +- [ ] No sequential integer PKs as public-facing identifiers. -### API Changes -- [ ] No personal data in URL path or query parameters. +### API +- [ ] No PII in URL paths or query parameters. - [ ] All endpoints returning personal data are authenticated. -- [ ] Ownership checks are in place (user cannot access another user's resource). -- [ ] Response projections exclude fields the caller is not authorized to see. +- [ ] Ownership checks present — user cannot access another user's resource. - [ ] Rate limiting applied to sensitive endpoints. -### Logging Changes +### Logging - [ ] No passwords, tokens, or credentials logged. -- [ ] No full email addresses or national IDs in high-volume logs. -- [ ] IPs are anonymized (last octet masked). +- [ ] IPs anonymized (last octet masked). - [ ] No full request/response bodies logged where PII may be present. -### Infrastructure Changes -- [ ] No storage buckets are public. -- [ ] Databases use private endpoints only. -- [ ] New cloud resources are tagged with `DataClassification`. +### Infrastructure +- [ ] No public storage buckets or public-IP databases. +- [ ] New cloud resources tagged with `DataClassification`. - [ ] Encryption at rest enabled for new storage resources. -- [ ] New geographic regions for data storage are approved and compliant with GDPR. +- [ ] New geographic regions for data storage are EEA-compliant or covered by SCCs. -### Secrets & Configuration +### Secrets & CI/CD - [ ] No secrets in source code or committed config files. -- [ ] New secrets are added to KMS and to the secrets inventory document. -- [ ] CI/CD secrets are masked in pipeline logs. - -### Retention & Deletion -- [ ] New data flows have a retention enforcement job or policy. -- [ ] Erasure pipeline covers this new data store or field. -- [ ] Soft-delete is used where hard-delete would be premature. +- [ ] New secrets added to KMS and secrets inventory document. +- [ ] CI/CD secrets masked in pipeline logs. -### User Rights -- [ ] If a new personal data field is introduced, the data export endpoint includes it. -- [ ] If a new data store is introduced, the erasure runbook is updated. +### Retention & erasure +- [ ] Retention enforcement job or policy covers new data store or field. +- [ ] Erasure pipeline updated to cover new data store. -### Third Parties & Sub-processors -- [ ] No new third-party service receives personal data without a signed DPA. -- [ ] New sub-processors are added to the RoPA. - -### General +### User rights & governance +- [ ] Data export endpoint includes any new personal data field. - [ ] RoPA updated if a new processing activity is introduced. -- [ ] No production personal data used in tests. -- [ ] DPIA triggered if the change involves high-risk processing (profiling, health, biometrics, large scale). - ---- - -## Quick Reference — MUST / MUST NOT Summary - -| Topic | MUST | MUST NOT | -|---|---|---| -| **Passwords** | bcrypt/Argon2id, cost ≥ 12 | MD5, SHA-1, SHA-256, plaintext storage | -| **Secrets** | KMS / secret manager | Commit to Git, hardcode in source | -| **Encryption** | TLS 1.2+, AES-256 at rest | HTTP, TLS 1.0/1.1 | -| **URLs** | Opaque UUIDs | Emails, names, national IDs in paths | -| **Logs** | Anonymized IPs, event-based | Passwords, tokens, full bodies with PII | -| **Error responses** | Generic messages, correlation ID | Stack traces, DB errors, user data | -| **Test data** | Synthetic / Faker-generated | Real production PII | -| **Retention** | TTL defined at design time | "Keep forever" | -| **Erasure** | Cover all stores, test it | Partial deletion leaving PII in logs/cache | -| **Third parties** | Signed DPA before data flows | Onboard without DPA | -| **IDs** | UUIDs as public identifiers | Sequential integers in public URLs | -| **CORS** | Explicit allowlist | `Access-Control-Allow-Origin: *` on auth APIs | +- [ ] New sub-processors have a signed DPA and a RoPA entry. +- [ ] DPIA triggered if the change involves high-risk processing. --- -> **Golden Rule — repeated for emphasis:** -> **Collect less. Store less. Expose less. Retain less.** +> **Golden Rule:** Collect less. Store less. Expose less. Retain less. > > Every byte of personal data you do not collect is a byte you cannot lose, > cannot breach, and cannot be held liable for. @@ -727,4 +288,4 @@ Use this checklist on every pull request that touches personal data, authenticat --- *Inspired by CNIL developer GDPR guidance, GDPR Articles 5, 25, 32, 33, 35, -and engineering best practices from ENISA, OWASP, and NIST.* +ENISA, OWASP, and NIST engineering best practices.* diff --git a/skills/gdpr-compliant/references/Security.md b/skills/gdpr-compliant/references/Security.md new file mode 100644 index 000000000..bc4972fbc --- /dev/null +++ b/skills/gdpr-compliant/references/Security.md @@ -0,0 +1,266 @@ +# GDPR Reference — Security, Operations & Architecture + +Load this file when you need implementation detail on: +encryption, password hashing, secrets management, anonymization/pseudonymization, +cloud/DevOps practices, CI/CD controls, incident response, architecture patterns. + +--- + +## Encryption + +### At-Rest Encryption + +| Data sensitivity | Minimum standard | +|---|---| +| Standard personal data (name, address, email) | AES-256 disk/volume encryption (cloud provider default) | +| Sensitive personal data (health, biometric, financial, national ID) | AES-256 **column-level** encryption + envelope encryption via KMS | +| Encryption keys | HSM-backed KMS (Azure Key Vault Premium / AWS KMS CMK / GCP Cloud KMS) | + +**Envelope encryption pattern:** +1. Encrypt data with a **Data Encryption Key (DEK)** (AES-256, generated per record or per table). +2. Encrypt the DEK with a **Key Encryption Key (KEK)** stored in the KMS. +3. Store the encrypted DEK alongside the encrypted data. +4. Deleting the KEK = effective crypto-shredding of all data encrypted with it. + +### In-Transit Encryption + +- **MUST** enforce TLS 1.2 minimum; prefer TLS 1.3. +- **MUST** set `Strict-Transport-Security: max-age=31536000; includeSubDomains; preload`. +- **MUST NOT** allow TLS 1.0, TLS 1.1, null cipher suites, or export-grade ciphers. +- **MUST NOT** use self-signed certificates in production. + +### Key Management + +- Rotate DEKs annually minimum; rotate immediately upon suspected compromise. +- Use separate key namespaces per environment (dev / staging / prod). +- Log all KMS key access events — alert on anomalous access patterns. +- MUST NOT hardcode encryption keys in source code or configuration files. + +--- + +## Password Hashing + +| Algorithm | Parameters | Notes | +|---|---|---| +| **Argon2id** ✅ recommended | memory ≥ 64 MB, iterations ≥ 3, parallelism ≥ 4 | OWASP and NIST recommended | +| **bcrypt** ✅ acceptable | cost factor ≥ 12 | Widely supported; use if Argon2id unavailable | +| **scrypt** ✅ acceptable | N=32768, r=8, p=1 | Good alternative | +| MD5 ❌ | — | Never — trivially broken | +| SHA-1 / SHA-256 ❌ | — | Never for passwords — not designed for this purpose | + +**MUST** +- Use a unique salt per password (built into all three algorithms above). +- Store only the hash — never the plaintext, never a reversible encoding. +- Re-hash on login if the stored hash uses an outdated algorithm — upgrade transparently. + +**SHOULD** +- Add a **pepper** (server-side secret added before hashing) stored in the KMS, not in the DB. +- Check passwords against known breach lists at registration (`haveibeenpwned` API, k-anonymity mode). +- Enforce minimum password length of 12 characters. + +**MUST NOT** +- Log passwords in any form — not during registration, not during failed login. +- Transmit passwords in URLs or query strings. +- Store password reset tokens in plaintext — hash them before storage. + +--- + +## Secrets Management + +**MUST** +- Store all secrets in a dedicated secret manager: Azure Key Vault, AWS Secrets Manager, + GCP Secret Manager, or HashiCorp Vault. +- Use pre-commit hooks to prevent secret commits: `gitleaks`, `detect-secrets`, GitHub native secret scanning. +- Rotate secrets immediately upon: developer offboarding, suspected compromise, annual schedule. +- Maintain a **secrets inventory document** — every secret listed with its purpose and rotation date. + +**SHOULD** +- Use **short-lived credentials** via OIDC federation (GitHub Actions → Azure/AWS/GCP) instead of long-lived API keys. +- Audit all KMS secret access — alert on access outside business hours or from unexpected sources. +- Use separate secret namespaces per environment. + +**`.gitignore` MUST include:** +``` +.env +.env.* +*.pem +*.key +*.pfx +*.p12 +secrets/ +appsettings.*.json # if it may contain connection strings +``` + +**MUST NOT** +- Commit secrets to source code repositories. +- Pass secrets as plain-text CLI arguments (they appear in process lists and shell history). +- Store secrets as unencrypted environment variable defaults in code. + +--- + +## Anonymization & Pseudonymization + +### Definitions + +| Term | Reversible? | GDPR scope? | Use case | +|---|---|---|---| +| **Anonymization** | No | Outside GDPR scope | Retained records after erasure, analytics datasets | +| **Pseudonymization** | Yes (with key) | Still personal data | Analytics pipelines, audit logs, reduced-risk processing | + +### Anonymization Techniques + +| Technique | How | When | +|---|---|---| +| Suppression | Remove the field entirely | Fields with no analytical value | +| Masking | Replace with fixed placeholder (`"ANONYMIZED_USER"`) | Audit log identifiers after erasure | +| Generalization | Replace exact value with a range (age 34 → "30–40") | Analytics | +| Noise addition | Add statistical noise to numerical values | Aggregate analytics | +| Aggregation | Report group statistics, never individual values | Reporting | +| K-anonymity | Ensure each record is indistinguishable from k-1 others | Analytics datasets | + +### Pseudonymization Techniques + +| Technique | How | +|---|---| +| HMAC-SHA256 with secret key | Consistent, one-way, keyed. Use for user IDs in analytics. Key in KMS. | +| Tokenization | Replace value with opaque token; mapping in separate secure vault. | +| Encryption with separate key | Decrypt only with explicit KMS authorization. | + +**MUST** +- When erasing a user, **anonymize** records that must be retained (financial, audit logs) — replace identifying fields with `"ANONYMIZED"` or a hashed placeholder. +- Store the pseudonymization key in the KMS — never in the same database as the pseudonymized data. +- Test anonymization routines with assertions: the original value MUST NOT be recoverable from the output. + +**Crypto-shredding pattern (event sourcing):** +Encrypt personal data in events with a per-user DEK. Store the DEK in the KMS. +On erasure: delete the DEK from the KMS → all events for that user are effectively anonymized. + +**MUST NOT** +- Call data "anonymized" if re-identification is possible through linkage with other datasets. +- Apply pseudonymization and store the mapping key in the same table as the pseudonymized data. + +--- + +## Cloud & DevOps Practices + +**MUST** +- Enable encryption at rest for all cloud storage: blobs, managed databases, queues, caches. +- Use **private endpoints** — databases MUST NOT be publicly accessible. +- Apply network security groups / firewall rules: restrict DB access to application layers only. +- Enable cloud-native audit logging: Azure Monitor / AWS CloudTrail / GCP Cloud Audit Logs. +- Store personal data only in **approved geographic regions** (EEA, or adequacy decision / SCCs). +- Tag all cloud resources processing personal data with a `DataClassification` tag. + +**SHOULD** +- Enable Microsoft Defender for Cloud / AWS Security Hub / GCP SCC — review recommendations weekly. +- Use **managed identities** (Azure) or **IAM roles** (AWS/GCP) instead of long-lived access keys. +- Enable soft delete and versioning on object storage. +- Apply DLP policies on cloud storage to detect PII written to unprotected buckets. +- Enable database-level audit logging for SELECT on sensitive tables. + +**MUST NOT** +- Store personal data in public storage buckets without access controls. +- Deploy databases with public IPs in production. +- Use the same cloud account/subscription for production and non-production if data could bleed across. + +--- + +## CI/CD Controls + +**MUST** +- Run **secret scanning** on every commit: `gitleaks`, `detect-secrets`, GitHub secret scanning. +- Run **dependency vulnerability scanning** on every build: `npm audit`, `dotnet list package --vulnerable`, `trivy`, `snyk`. +- MUST NOT use real personal data in CI test jobs. +- MUST NOT log environment variables in CI pipelines — mask all secrets. + +**SHOULD** +- Run **SAST**: SonarQube, Semgrep, or CodeQL on every PR. +- Run **container image scanning**: `trivy`, Snyk Container, or AWS ECR scanning. +- Add a **GDPR compliance gate** to the pipeline: + - New migrations without a documented retention period → fail. + - Log statements containing known PII field names → warn. + +**Pipeline secret rules:** +```yaml +# MUST: mask secrets before use +- name: Mask secret + run: echo "::add-mask::${{ secrets.MY_SECRET }}" + +# MUST NOT: echo secrets to console +- run: echo "Key=$API_KEY" # ❌ Never + +# SHOULD: use OIDC federation (no long-lived keys) +- uses: azure/login@v1 + with: + client-id: ${{ vars.AZURE_CLIENT_ID }} + tenant-id: ${{ vars.AZURE_TENANT_ID }} + subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }} +``` + +--- + +## Incident & Breach Handling + +### Regulatory Timeline + +| Window | Obligation | +|---|---| +| **72 hours** from awareness | Notify the supervisory authority (CNIL, APD, ICO…) — unless breach is unlikely to risk individuals | +| **Without undue delay** | Notify affected data subjects if breach is likely to result in **high risk** to their rights | + +Log **all** personal data breaches internally — even those that do not require DPA notification. + +### Breach Response Runbook (template) + +1. **Detection** — Define criteria: what triggers an incident (credential leak, DB dump exposed, ransomware, accidental public bucket). +2. **Severity classification** — Low / Medium / High / Critical based on data sensitivity and volume. +3. **Containment** — Revoke compromised credentials; isolate affected systems; preserve evidence (do NOT delete logs). +4. **Assessment** — What data was exposed? How many subjects? What is the risk level? +5. **DPA notification** — Use the supervisory authority's online portal; include: nature of breach, categories and approximate number of data subjects, categories and approximate number of records, contact point, likely consequences, measures taken. +6. **Data subject notification** — If high risk: clear language, nature of breach, likely consequences, measures taken, DPO contact. +7. **Post-incident review** — Root cause analysis; corrective measures; update runbook. + +### Automated Breach Detection Alerts + +Configure alerts for: +- Unusual volume of data exports (threshold per hour) +- Access to sensitive tables outside business hours +- Bulk deletion events +- Failed authentication spikes +- New credentials appearing in public breach databases (HaveIBeenPwned monitoring) + +Store breach records internally for at least **5 years**. + +--- + +## Architecture Patterns + +### Data Store Separation +Separate operational data (transactional DB) from analytical data (data warehouse). +Apply different retention periods and access controls to each. +The analytics store MUST NOT read directly from production operational tables. + +### Dedicated Consent Store +Track consent as an immutable event log in a separate store, not a boolean column on the user table. +This enables: auditable consent history, version tracking, easy withdrawal without data loss. + +### Audit Log Segregation +Store audit logs in a separate, append-only store. +The application service account MUST NOT be able to delete audit log entries. +Use a separate DB user with INSERT-only rights on the audit table. + +### DSR Queue Pattern +Implement Data Subject Requests as an asynchronous workflow: +`POST /api/v1/me/erasure-request` → enqueue a job → worker scrubs all stores → notify user on completion. +This handles the complexity of multi-store scrubbing reliably and provides a retry mechanism. + +### Pseudonymization Gateway +For analytics pipelines, implement a pseudonymization service at the boundary between +operational and analytical systems. +The mapping key (HMAC secret or tokenization vault) never leaves the operational zone. +The analytics zone receives only pseudonymized identifiers. + +### Crypto-Shredding (Event Sourcing) +Encrypt personal data in events with a per-user DEK stored in the KMS. +On user erasure: delete the DEK → all historical events for that user are effectively anonymized +without modifying the event log. diff --git a/skills/gdpr-compliant/references/data-rights.md b/skills/gdpr-compliant/references/data-rights.md new file mode 100644 index 000000000..2ace756c2 --- /dev/null +++ b/skills/gdpr-compliant/references/data-rights.md @@ -0,0 +1,177 @@ +# GDPR Reference — Data Rights, Accountability & Governance + +Load this file when you need implementation detail on: +user rights endpoints, Data Subject Request (DSR) workflow, +Record of Processing Activities (RoPA), consent management. + +--- + +## User Rights Implementation (Articles 15–22) + +Every right MUST have a tested API endpoint or documented back-office process +before the system goes live. Respond to verified requests within **30 calendar days**. + +| Right | Article | Engineering implementation | +|---|---|---| +| Right of access | 15 | `GET /api/v1/me/data-export` — all personal data, JSON or CSV | +| Right to rectification | 16 | `PUT /api/v1/me/profile` — propagate to all downstream stores | +| Right to erasure | 17 | `DELETE /api/v1/me` — scrub all stores per erasure checklist | +| Right to restriction | 18 | `ProcessingRestricted` flag on user record; gate non-essential processing | +| Right to portability | 20 | Same as access endpoint; structured, machine-readable (JSON) | +| Right to object | 21 | Opt-out endpoint for legitimate-interest processing; honor immediately | +| Automated decision-making | 22 | Expose a human review path + explanation of the logic | + +### Erasure Checklist — MUST cover all stores + +When `DELETE /api/v1/me` is called, the erasure pipeline MUST scrub: + +- [ ] Primary relational database (anonymize or delete rows) +- [ ] Read replicas +- [ ] Search index (Elasticsearch, Azure Cognitive Search, etc.) +- [ ] In-memory cache (Redis, IMemoryCache) +- [ ] Object storage (S3, Azure Blob — profile pictures, documents) +- [ ] Email service logs (Brevo, SendGrid — delivery logs) +- [ ] Analytics platform (Mixpanel, Amplitude, GA4 — user deletion API) +- [ ] Audit logs (anonymize identifying fields — do not delete the event) +- [ ] Backups (document the backup TTL; accept that backups expire naturally) +- [ ] CDN edge cache (purge if personal data may be cached) +- [ ] Third-party sub-processors (trigger their deletion API or document the manual step) + +### Data Export Format (`GET /api/v1/me/data-export`) + +```json +{ + "exportedAt": "2025-03-30T10:00:00Z", + "subject": { + "id": "uuid", + "email": "user@example.com", + "createdAt": "2024-01-15T08:30:00Z" + }, + "profile": { ... }, + "orders": [ ... ], + "consents": [ ... ], + "auditEvents": [ ... ] +} +``` + +- MUST be machine-readable (JSON preferred, CSV acceptable). +- MUST NOT be a PDF screenshot or HTML page. +- MUST include all stores listed in the RoPA for this user. + +### DSR Tracker (back-office) + +Implement a **Data Subject Request tracker** with: +- Incoming request date +- Request type (access / rectification / erasure / portability / restriction / objection) +- Verification status (identity confirmed y/n) +- Deadline (received date + 30 days) +- Assigned handler +- Completion date and outcome +- Notes + +Automate the primary store scrubbing; document manual steps for third-party stores. + +--- + +## Record of Processing Activities (RoPA) + +Maintain as a living document (Markdown, YAML, or JSON) version-controlled in the repo. +Update with **every** new feature that introduces a processing activity. + +### Minimum fields per processing activity + +```yaml +- name: "User account management" + purpose: "Create and manage user accounts for service access" + legalBasis: "Contract (Art. 6(1)(b))" + dataSubjects: ["Registered users"] + personalDataCategories: ["Name", "Email", "Password hash", "IP address"] + recipients: ["Internal engineering team", "Brevo (email delivery)"] + retentionPeriod: "Account lifetime + 12 months" + transfers: + outside_eea: true + safeguard: "Brevo — Standard Contractual Clauses (SCCs)" + securityMeasures: ["TLS 1.3", "AES-256 at rest", "bcrypt password hashing"] + dpia_required: false +``` + +### Legal basis options (Art. 6) + +| Basis | When to use | +|---|---| +| `Contract (6(1)(b))` | Processing necessary to fulfill the service contract | +| `Legitimate interest (6(1)(f))` | Fraud prevention, security, analytics (requires balancing test) | +| `Consent (6(1)(a))` | Marketing, non-essential cookies, optional profiling | +| `Legal obligation (6(1)(c))` | Tax records, anti-money-laundering | +| `Vital interest (6(1)(d))` | Emergency situations only | +| `Public task (6(1)(e))` | Public authorities | + +--- + +## Consent Management + +### MUST + +- Store consent as an **immutable event log**, not a mutable boolean flag. +- Record: what was consented to, when, which version of the privacy policy, the mechanism. +- Load analytics / marketing SDKs **conditionally** — only after consent is granted. +- Provide a consent withdrawal mechanism as easy to use as the consent grant. + +### Consent store schema (minimum) + +```sql +CREATE TABLE ConsentRecords ( + Id UUID PRIMARY KEY, + UserId UUID NOT NULL, + Purpose VARCHAR(100) NOT NULL, -- e.g. "marketing_emails", "analytics" + Granted BOOLEAN NOT NULL, + PolicyVersion VARCHAR(20) NOT NULL, + ConsentedAt TIMESTAMPTZ NOT NULL, + IpAddressHash VARCHAR(64), -- HMAC-SHA256 of anonymized IP + UserAgent VARCHAR(500) +); +``` + +### MUST NOT + +- MUST NOT pre-tick consent checkboxes. +- MUST NOT bundle consent for marketing with consent for service delivery. +- MUST NOT make service access conditional on marketing consent. +- MUST NOT use dark patterns (e.g., "Accept all" prominent, "Reject" buried). + +--- + +## Sub-processor Management + +Maintain a **sub-processor list** updated with every new SaaS tool or cloud service +that touches personal data. + +Minimum fields per sub-processor: + +| Field | Example | +|---|---| +| Name | Brevo | +| Service | Transactional email | +| Data categories transferred | Email address, name, email content | +| Processing location | EU (Paris) | +| DPA signed | ✅ 2024-01-10 | +| DPA URL / reference | [link] | +| SCCs applicable | N/A (EU-based) | + +**MUST** review the sub-processor list annually and upon any change. +**MUST NOT** allow data to flow to a new sub-processor before a DPA is signed. + +--- + +## DPIA Triggers (Article 35) + +A DPIA is **mandatory** before processing that is likely to result in a high risk. Triggers include: + +- Systematic and extensive profiling with significant effects on individuals +- Large-scale processing of special category data (health, biometric, racial origin, sexual orientation, religion) +- Systematic monitoring of publicly accessible areas (CCTV, location tracking) +- Processing of children's data at scale +- Innovative technology with unknown privacy implications +- Matching or combining datasets from multiple sources + +When in doubt: conduct the DPIA anyway. Document the outcome. From 1abe8d9085a67375832a862067beb7488e2a9c8b Mon Sep 17 00:00:00 2001 From: Mikael Krief Date: Mon, 30 Mar 2026 22:28:23 +0200 Subject: [PATCH 3/4] Refine GDPR compliance documentation by removing unnecessary symbols and ensuring clarity in security and data rights references --- skills/gdpr-compliant/SKILL.md | 54 +++++++++---------- skills/gdpr-compliant/references/Security.md | 12 ++--- .../gdpr-compliant/references/data-rights.md | 24 ++++----- 3 files changed, 45 insertions(+), 45 deletions(-) diff --git a/skills/gdpr-compliant/SKILL.md b/skills/gdpr-compliant/SKILL.md index f3c2823b4..5aa1ac010 100644 --- a/skills/gdpr-compliant/SKILL.md +++ b/skills/gdpr-compliant/SKILL.md @@ -114,7 +114,7 @@ For deep dives, read the reference files in `references/`: **MUST** - MUST NOT include personal data in URL paths or query parameters. - - ❌ `GET /users/john.doe@example.com` ✅ `GET /users/{userId}` + - `GET /users/{userId}` - Authenticate all endpoints that return or accept personal data. - Extract the acting user's identity from the JWT — never from the request body. - Validate ownership on every resource: `if (resource.OwnerId != currentUserId) return 403`. @@ -134,7 +134,7 @@ For deep dives, read the reference files in `references/`: **MUST** - Anonymize IPs in application logs — mask last octet (IPv4) or last 80 bits (IPv6). - - ❌ `192.168.1.42` ✅ `192.168.1.xxx` + - `192.168.1.xxx` - MUST NOT log: passwords, tokens, session IDs, credentials, card numbers, national IDs, health data. - MUST NOT log full request/response bodies where PII may be present. - Enforce log retention — purge automatically after the defined period. @@ -150,8 +150,8 @@ For deep dives, read the reference files in `references/`: **MUST** - Return generic error messages — never expose stack traces, internal paths, or DB errors. - - ❌ `"Column 'email' violates unique constraint on table 'users'"` - - ✅ `"A user with this email address already exists."` + - `"Column 'email' violates unique constraint on table 'users'"` + - `"A user with this email address already exists."` - Use **Problem Details (RFC 7807)** for all error responses. - Log the full error server-side with a correlation ID; return only the correlation ID to the client. @@ -242,41 +242,41 @@ For deep dives, read the reference files in `references/`: ## 15. PR Review Checklist ### Data model -- [ ] Every new PII column has a documented purpose and retention period. -- [ ] Sensitive fields (health, financial, national ID) use column-level encryption. -- [ ] No sequential integer PKs as public-facing identifiers. +- Every new PII column has a documented purpose and retention period. +- Sensitive fields (health, financial, national ID) use column-level encryption. +- No sequential integer PKs as public-facing identifiers. ### API -- [ ] No PII in URL paths or query parameters. -- [ ] All endpoints returning personal data are authenticated. -- [ ] Ownership checks present — user cannot access another user's resource. -- [ ] Rate limiting applied to sensitive endpoints. +- No PII in URL paths or query parameters. +- All endpoints returning personal data are authenticated. +- Ownership checks present — user cannot access another user's resource. +- Rate limiting applied to sensitive endpoints. ### Logging -- [ ] No passwords, tokens, or credentials logged. -- [ ] IPs anonymized (last octet masked). -- [ ] No full request/response bodies logged where PII may be present. +- No passwords, tokens, or credentials logged. +- IPs anonymized (last octet masked). +- No full request/response bodies logged where PII may be present. ### Infrastructure -- [ ] No public storage buckets or public-IP databases. -- [ ] New cloud resources tagged with `DataClassification`. -- [ ] Encryption at rest enabled for new storage resources. -- [ ] New geographic regions for data storage are EEA-compliant or covered by SCCs. +- No public storage buckets or public-IP databases. +- New cloud resources tagged with `DataClassification`. +- Encryption at rest enabled for new storage resources. +- New geographic regions for data storage are EEA-compliant or covered by SCCs. ### Secrets & CI/CD -- [ ] No secrets in source code or committed config files. -- [ ] New secrets added to KMS and secrets inventory document. -- [ ] CI/CD secrets masked in pipeline logs. +- No secrets in source code or committed config files. +- New secrets added to KMS and secrets inventory document. +- CI/CD secrets masked in pipeline logs. ### Retention & erasure -- [ ] Retention enforcement job or policy covers new data store or field. -- [ ] Erasure pipeline updated to cover new data store. +- Retention enforcement job or policy covers new data store or field. +- Erasure pipeline updated to cover new data store. ### User rights & governance -- [ ] Data export endpoint includes any new personal data field. -- [ ] RoPA updated if a new processing activity is introduced. -- [ ] New sub-processors have a signed DPA and a RoPA entry. -- [ ] DPIA triggered if the change involves high-risk processing. +- Data export endpoint includes any new personal data field. +- RoPA updated if a new processing activity is introduced. +- New sub-processors have a signed DPA and a RoPA entry. +- DPIA triggered if the change involves high-risk processing. --- diff --git a/skills/gdpr-compliant/references/Security.md b/skills/gdpr-compliant/references/Security.md index bc4972fbc..ca762f86d 100644 --- a/skills/gdpr-compliant/references/Security.md +++ b/skills/gdpr-compliant/references/Security.md @@ -42,11 +42,11 @@ cloud/DevOps practices, CI/CD controls, incident response, architecture patterns | Algorithm | Parameters | Notes | |---|---|---| -| **Argon2id** ✅ recommended | memory ≥ 64 MB, iterations ≥ 3, parallelism ≥ 4 | OWASP and NIST recommended | -| **bcrypt** ✅ acceptable | cost factor ≥ 12 | Widely supported; use if Argon2id unavailable | -| **scrypt** ✅ acceptable | N=32768, r=8, p=1 | Good alternative | -| MD5 ❌ | — | Never — trivially broken | -| SHA-1 / SHA-256 ❌ | — | Never for passwords — not designed for this purpose | +| **Argon2id** recommended | memory ≥ 64 MB, iterations ≥ 3, parallelism ≥ 4 | OWASP and NIST recommended | +| **bcrypt** acceptable | cost factor ≥ 12 | Widely supported; use if Argon2id unavailable | +| **scrypt** acceptable | N=32768, r=8, p=1 | Good alternative | +| MD5 | — | Never — trivially broken | +| SHA-1 / SHA-256 | — | Never for passwords — not designed for this purpose | **MUST** - Use a unique salt per password (built into all three algorithms above). @@ -187,7 +187,7 @@ On erasure: delete the DEK from the KMS → all events for that user are effecti run: echo "::add-mask::${{ secrets.MY_SECRET }}" # MUST NOT: echo secrets to console -- run: echo "Key=$API_KEY" # ❌ Never +- run: echo "Key=$API_KEY" # Never # SHOULD: use OIDC federation (no long-lived keys) - uses: azure/login@v1 diff --git a/skills/gdpr-compliant/references/data-rights.md b/skills/gdpr-compliant/references/data-rights.md index 2ace756c2..81cfb192a 100644 --- a/skills/gdpr-compliant/references/data-rights.md +++ b/skills/gdpr-compliant/references/data-rights.md @@ -25,17 +25,17 @@ before the system goes live. Respond to verified requests within **30 calendar d When `DELETE /api/v1/me` is called, the erasure pipeline MUST scrub: -- [ ] Primary relational database (anonymize or delete rows) -- [ ] Read replicas -- [ ] Search index (Elasticsearch, Azure Cognitive Search, etc.) -- [ ] In-memory cache (Redis, IMemoryCache) -- [ ] Object storage (S3, Azure Blob — profile pictures, documents) -- [ ] Email service logs (Brevo, SendGrid — delivery logs) -- [ ] Analytics platform (Mixpanel, Amplitude, GA4 — user deletion API) -- [ ] Audit logs (anonymize identifying fields — do not delete the event) -- [ ] Backups (document the backup TTL; accept that backups expire naturally) -- [ ] CDN edge cache (purge if personal data may be cached) -- [ ] Third-party sub-processors (trigger their deletion API or document the manual step) +- Primary relational database (anonymize or delete rows) +- Read replicas +- Search index (Elasticsearch, Azure Cognitive Search, etc.) +- In-memory cache (Redis, IMemoryCache) +- Object storage (S3, Azure Blob — profile pictures, documents) +- Email service logs (Brevo, SendGrid — delivery logs) +- Analytics platform (Mixpanel, Amplitude, GA4 — user deletion API) +- Audit logs (anonymize identifying fields — do not delete the event) +- Backups (document the backup TTL; accept that backups expire naturally) +- CDN edge cache (purge if personal data may be cached) +- Third-party sub-processors (trigger their deletion API or document the manual step) ### Data Export Format (`GET /api/v1/me/data-export`) @@ -154,7 +154,7 @@ Minimum fields per sub-processor: | Service | Transactional email | | Data categories transferred | Email address, name, email content | | Processing location | EU (Paris) | -| DPA signed | ✅ 2024-01-10 | +| DPA signed | 2024-01-10 | | DPA URL / reference | [link] | | SCCs applicable | N/A (EU-based) | From aa54538613894f623756bef05cb204b367da89bd Mon Sep 17 00:00:00 2001 From: Mikael Krief Date: Mon, 30 Mar 2026 22:46:33 +0200 Subject: [PATCH 4/4] refactor: streamline description formatting in GDPR compliance skill documentation --- skills/gdpr-compliant/SKILL.md | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/skills/gdpr-compliant/SKILL.md b/skills/gdpr-compliant/SKILL.md index 5aa1ac010..a2fde3150 100644 --- a/skills/gdpr-compliant/SKILL.md +++ b/skills/gdpr-compliant/SKILL.md @@ -1,14 +1,6 @@ --- name: gdpr-compliant -description: 'Apply GDPR-compliant engineering practices across your codebase. Use this skill - whenever you are designing APIs, writing data models, building authentication flows, - implementing logging, handling user data, writing retention/deletion jobs, designing - cloud infrastructure, or reviewing pull requests for privacy compliance. - Trigger this skill for any task involving personal data, user accounts, cookies, - analytics, emails, audit logs, encryption, pseudonymization, anonymization, - data exports, breach response, CI/CD pipelines that process real data, or any - question framed as "is this GDPR-compliant?". Inspired by CNIL developer guidance - and GDPR Articles 5, 25, 32, 33, 35.' +description: 'Apply GDPR-compliant engineering practices across your codebase. Use this skill whenever you are designing APIs, writing data models, building authentication flows, implementing logging, handling user data, writing retention/deletion jobs, designing cloud infrastructure, or reviewing pull requests for privacy compliance. Trigger this skill for any task involving personal data, user accounts, cookies, analytics, emails, audit logs, encryption, pseudonymization, anonymization, data exports, breach response, CI/CD pipelines that process real data, or any question framed as "is this GDPR-compliant?". Inspired by CNIL developer guidance and GDPR Articles 5, 25, 32, 33, 35.' --- # GDPR Engineering Skill