CULTURAL CONCEPTS OF DECENCY AND RESPONSIBILITY: IBM’s photo-scraping scandal shows what a weird bubble AI researchers live in.
Really, for industry insiders, IBM did nothing out of the ordinary. AI researchers hoover up data from various corners of the internet all the time to feed the ever-hungry machine-learning algorithms that require massive amounts of it to train. Instagram photos, for example, are a common source of image data; the hashtags often conveniently correspond to the content of the photos, making it extra easy to generate labeled data. New York Times and Wall Street Journal articles are also a common source of data for well-written, copy-edited sentences. Even better that they are categorized by topic: technology, business, sports.
In fact, scraping data from publicly available sources is so much of an industry standard that it’s taught as a foundational skill (sans ethics) in most data science and machine-learning training. Meanwhile, most tech platforms are designed to invite such scraping by offering APIs with direct access to their data. Until recently, this was done without second thought. (Hello, Facebook.)
To more and more people, it seems that they see ordinary people as experimental subjects. That isn’t a good place to be.