Medical records for all 500,000 UK Biobank volunteers – including ICD-coded diagnoses with cancer dates, proteomics, and metabolomics – listed for sale on Alibaba in April 2025.
Key Takeaways
Data scope is extreme: gender, age, birth date, socioeconomic status, mental health, cognitive function, haematology, biochemistry, metabolomics, proteomics, and ICD-coded disease outcomes.
Three separate Alibaba listings surfaced 20 April; at least one contained the full 500,000-participant dataset.
Listings also offered “support for applying for” access – suggesting operational infrastructure, not just a data dump.
UK Biobank confirmed the listings to The BMJ but stated no evidence of participant re-identification.
Hacker News Comment Review
Governance red flag: a commenter reviewed UK Biobank’s public board and found no visible cybersecurity representation – only scientists, doctors, finance, and ML/IT delivery roles – making the CEO’s reassurances harder to accept.
This is the third related thread in days; prior discussions covered UK Biobank data appearing on GitHub, suggesting a systemic access-control failure rather than a one-off incident.
Policy debate split: one side argues data shared with 20,000 global researchers is functionally already public and prefers open publication with criminalized misuse; the other points to Palantir’s NHS deal as a state-level parallel with no accountability.
Notable Comments
@cs02rm0: Scanned the board page – no cybersecurity background visible; CEO’s re-identification denial reads as a non-answer given the data’s depth.
@thom: Asks whether Ben Goldacre’s Trusted Research Environment model would have prevented this – a concrete architectural counterfactual worth tracking.
@Canada: Notes the public rejected this data collection repeatedly; “nobody will be punished” is the predicted and predicted outcome.