Bluesky’s open API means anyone can scrape your data for AI training
Bluesky's open API allows third parties to scrape user data for AI without enforceable consent.
Bluesky, a burgeoning social network, offers an open API that permits third parties to scrape data for purposes such as AI training. Despite not engaging in this practice itself, the platform's policy opens a significant loophole in user data privacy that third parties can exploit.
A recent 404 Media report revealed that an employee from AI firm Hugging Face collected 1 million public posts from Bluesky using the Firehose API. The extracted dataset, intended for machine learning research, was later retracted due to a public outcry, underscoring the potential privacy issues surrounding public posts on the platform.
Bluesky has announced efforts to allow users to express consent preferences for data use. However, they acknowledge their inability to enforce these preferences externally, leaving it to developers to honor user settings. As Bluesky gains popularity, it faces increasing scrutiny comparable to that of other major social platforms.