Bluesky’s open API means anyone can scrape your data for AI training

Bluesky's open API allows third parties to scrape user data for AI without enforceable consent.

: Bluesky's open API permits third parties to gather user data for AI training, sparking privacy concerns. A Hugging Face employee extracted 1 million posts using the Firehose API, highlighting data scraping risks. Bluesky is exploring consent preference settings but lacks enforcement capability. The platform's growing popularity brings increased scrutiny.

Bluesky, a burgeoning social network, offers an open API that permits third parties to scrape data for purposes such as AI training. Despite not engaging in this practice itself, the platform's policy opens a significant loophole in user data privacy that third parties can exploit.

A recent 404 Media report revealed that an employee from AI firm Hugging Face collected 1 million public posts from Bluesky using the Firehose API. The extracted dataset, intended for machine learning research, was later retracted due to a public outcry, underscoring the potential privacy issues surrounding public posts on the platform.

Bluesky has announced efforts to allow users to express consent preferences for data use. However, they acknowledge their inability to enforce these preferences externally, leaving it to developers to honor user settings. As Bluesky gains popularity, it faces increasing scrutiny comparable to that of other major social platforms.