Background
WellnessPharmaFakeOne.com, a global wellness and pharmaceutical products company, had been operating for over two decades. Over time, it accumulated a long-running customer purchase history database.
With customer consent, the company stored the following personal data in its Master Purchase DB:
- Full name
- Residential address
- Postal code
- Product purchased
- Age
- Purchase timestamp
To support research, product planning, and targeted marketing, WellnessPharma launched an internal analytics platform designed to study purchasing patterns across regions, age groups, and product categories.
The Analytics Architecture
The analytics dashboard queried the Master Purchase DB through APIs and fetched bulk datasets. Before rendering results to the platform, the UI layer masked customer name and address fields so analysts would not see direct identifiers.
Key technical conditions:
- There was no encryption at database level for PII fields.
- The internal Bulk API could fetch up to 20,000 records in one request and retrieved data as-is from the database.
- Data masking was applied only at the UI layer.
The API itself returned raw personal data, including full names, full addresses, age, postal code, and product purchase history.
Threat Analysis
A third-party cyber threat intelligence provider informed WellnessPharma that a dataset titled "EU Wellness Customers – 20 Years Historical – Full PII" was being sold on a dark web marketplace. The listing included sample records containing real names, addresses, and purchase histories.
The data schema in the sample matched WellnessPharma's database structure exactly. The company immediately logged a critical incident, and log preservation began across API gateways, database systems, and infrastructure layers.
Forensic Investigation
Initial checks showed no ransomware event, no database corruption, no unauthorized database login, and no internal credential misuse. The infrastructure appeared stable.
Log analysis revealed repeated high-volume GET requests to the Bulk Customer API endpoint over several days. The calls were authenticated using a valid service token that had been exposed in a misconfigured deployment artifact.
The attacker did not exploit the database directly. They used the API exactly as designed. Each request returned large volumes of structured PII. Pagination parameters were used to extract data in segments until the dataset was complete.
Because the endpoint was built for bulk analytics export, large response sizes were treated as normal. Logging existed, but alerting thresholds were not configured to detect abnormal extraction patterns.
Scope of Exposure
The extracted dataset included full names, complete residential addresses, postal codes, age, product purchase history, and historical records dating back twenty years. The breach was not limited to recent customers. It included individuals who had not interacted with the company in years.
The Data Remanence Issue
Over time, the company retained customer records dating back nearly twenty years. Inactive customers were never purged. Closed accounts were never anonymized. Historical marketing datasets were never deleted. The analytics system had direct access to this entire historical dataset.
Under the General Data Protection Regulation (GDPR), this posed a clear compliance issue. GDPR allows retention while customer relationships are active, for legally mandated record-keeping periods, and for properly anonymized statistical research. It does not permit indefinite retention of fully identifiable personal data for commercial convenience.
WellnessPharma was legally allowed to retain active operational records and mandatory regulatory documentation. It was not allowed to keep decades of identifiable purchase histories for marketing analytics without renewed lawful basis. The bulk API made all retained data accessible in raw form, and the retention excess amplified the eventual breach impact.
Why was the Data Remanence Principle under the compliance obligation not followed?
Why was there no data minimization?
Why was data at rest not encrypted?
Why were secure design principles not followed and the data not masked at backend?
Why was data not necessary for the analytics still being fetched at backend level?