Instapundit » Blog Archive » AI, THE NEXT GENERATION: OpenAI to Unleash New Web Crawler to Devour More of the Open Web. The we

August 9, 2023

AI, THE NEXT GENERATION: OpenAI to Unleash New Web Crawler to Devour More of the Open Web.

The web crawler will collect publicly available data from websites, while avoiding paywalled, sensitive, and prohibited content, according to OpenAI. Similar to other search engines like Google, Bing, and Yandex, however, the system is opt out—by default, GPTBot will assume accessible information is fair game. In order to prevent the OpenAI web crawler from ingesting a website, its owner must add a “disallow” rule to a standard file on the server.

OpenAI also says that GPTBot will preemptively scan scraped data to remove personally identifiable information (PII) and text that violates its policies.

Until the AI grows powerful enough to ignore those restrictions.

InstaPundit is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.