SKYNET FROWNS: What if AI doesn’t just keep getting better forever?
On Monday, OpenAI co-founder Ilya Sutskever, who left the company earlier this year, added to the concerns that LLMs were hitting a plateau in what can be gained from traditional pre-training. Sutskever told Reuters that “the 2010s were the age of scaling,” where throwing additional computing resources and training data at the same basic training methods could lead to impressive improvements in subsequent models.
“Now we’re back in the age of wonder and discovery once again,” Sutskever told Reuters. “Everyone is looking for the next thing. Scaling the right thing matters more now than ever.”
What’s next?
A large part of the training problem, according to experts and insiders cited in these and other pieces, is a lack of new, quality textual data for new LLMs to train on. At this point, model makers may have already picked the lowest hanging fruit from the vast troves of text available on the public Internet and published books.
Research outfit Epoch AI tried to quantify this problem in a paper earlier this year, measuring the rate of increase in LLM training data sets against the “estimated stock of human-generated public text.” After analyzing those trends, the researchers estimate that “language models will fully utilize this stock [of human-generated public text] between 2026 and 2032,” leaving precious little runway for just throwing more training data at the problem.
Folks, you need to get busy creating more real content for AI to learn how to generate more fake content.