Amazon investigating Perplexity AI after accusations it scrapes websites without consent

Amazon Web Services is probing Perplexity AI for possibly using a crawler that ignores the Robots Exclusion Protocol.

: Amazon Web Services is investigating Perplexity AI for allegedly using a web crawler that disregards the Robots Exclusion Protocol, according to Wired. Perplexity AI denies the claims, stating their crawler respects robots.txt directives. The investigation continues as Wired reports detection of the crawler on several major websites.

Amazon Web Services has initiated an investigation to determine if Perplexity AI is contravening its guidelines by operating a web crawler that ignores the Robots Exclusion Protocol. Wired discovered a virtual machine, hosted by AWS and operated by Perplexity, ignoring robots.txt files and scraping content from websites like Condé Nast properties, The Guardian, Forbes, and The New York Times over the past three months.

To verify these activities, Wired used Perplexity's chatbot and found that it delivered paraphrased outputs from their content with minimal attribution. Responding to Wired's allegations, Perplexity spokesperson Sara Platnick and CEO Aravind Srinivas both denied breaching protocol, yet admitted to using third-party crawlers and bypassing robots.txt when users directly include specific URLs in their inquiries.

Amazon emphasizes that its users must adhere to robots.txt specifications and AWS Terms of Service, which prohibit illegal activities. While Perplexity maintains its compliance with these rules, the revelations of third-party crawler use and partial bypassing leave significant questions. This case is part of a broader scrutiny about how AI companies gather data to train large language models, suggesting further regulatory challenges ahead.