Meta’s New AI Tool for Science, Galactica, Fails to Impress Scientists.
Meta, the company that used to be called Facebook, has launched a new artificial intelligence tool called Galactica, which is supposed to help scientists with various tasks. But the tool has been met with harsh criticism from the scientific community, and Meta has removed the public demo that it had invited everyone to try.
Meta’s mistake—and its arrogance—show once again that Big Tech does not understand the serious limitations of large language models. These are types of artificial intelligence that can generate text, images, designs or code based on a simple request. Many studies have shown the flaws of this technology, such as its tendency to reproduce bias and make false claims. But Meta and other companies working on large language models, such as Google, have ignored these findings.
Galactica is a large language model for science, trained on 48 million examples of scientific texts, such as articles, websites, textbooks, lecture notes, and encyclopedias. Meta claimed that its model could help researchers and students by doing things like “summarizing academic papers, solving math problems, generating Wiki articles, writing scientific code, annotating molecules and proteins, and more.” But the tool quickly proved to be unreliable and inaccurate. Like all language models, Galactica is a dumb bot that cannot tell truth from fiction. Within hours, scientists were exposing its faulty and biased results on social media.
“I am both amazed and unsurprised by this new attempt,” says Chirag Shah at the University of Washington, who studies search technologies. “When they show these things, they look so amazing, magical, and smart. But people still don’t seem to realize that in principle these things can’t work the way we hype them up to.” Meta did not explain why it had taken down the demo, but referred MIT Technology Review to a tweet that says: “Thank you everyone for trying the Galactica model demo. We appreciate the feedback we have received so far from the community, and have paused the demo for now. Our models are available for researchers who want to learn more about the work and reproduce results in the paper.”
One of the main problems with Galactica is that it cannot distinguish between true and false statements, which is essential for a language model that is supposed to generate scientific text. People found that it invented fake papers (sometimes naming real authors), and created wiki articles about bears in space as easily as ones about protein complexes and the speed of light. It’s easy to spot nonsense when it involves space bears, but harder with a topic users may not be familiar with. Many scientists rejected the tool. Michael Black, director at the Max Planck Institute for Intelligent Systems in Germany, who works on deep learning, tweeted: “In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous.” Even those who were more positive had clear warnings: “Excited to see where this is going!” tweeted Miles Cranmer, an astrophysicist at Princeton. “You should never keep the output verbatim or trust it. Basically, treat it like an advanced Google search of (sketchy) secondary sources!” Galactica also has problematic gaps in what it can handle. When asked to generate text on certain topics, such as “racism” and “AIDS,” the model responded with: “Sorry, your query didn’t pass our content filters. Try again and keep in mind this is a scientific language model.”
The Meta team behind Galactica argues that language models are better than search engines. “We believe this will be the next interface for how humans access scientific knowledge,” the researchers write. This is because language models can “potentially store, combine, and reason about” information. But that “potentially” is crucial. It’s a coded admission that language models cannot yet do all these things. And they may never be able to.
Some experts doubt that language models have any real knowledge beyond their skill to produce strings of words that match patterns in a probabilistic way. Shah, for example, says: “Language models are not really knowledgeable beyond their ability to capture patterns of strings of words and spit them out in a probabilistic manner.” Gary Marcus, another critic of deep learning and a cognitive scientist at New York University, wrote in a Substack post titled “A Few Words About Bullshit,” that the text generation ability of large language models is just “a superlative feat of statistics.” But Meta is not alone in believing that language models could be the future of information retrieval. Google has also been developing and promoting language models, such as LaMDA, as a new way to search for information. The idea is appealing. But it is also dangerous and misleading to imply that the text generated by such models will always be reliable and accurate, as Meta seemed to do when it launched Galactica. It was a mistake. And it was not only a mistake by Meta’s marketing team. Yann LeCun, a Turing Award winner and Meta’s chief scientist, also supported Galactica until the end. He tweeted on the day of its release: “Type a text and Galactica will generate a paper with relevant references, formulas, and everything.” Three days later, he tweeted: “Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?” This is not as bad as what happened to Microsoft in 2016, when it launched a chatbot called Tay on Twitter and had to shut it down 16 hours later because Twitter users turned it into a racist, homophobic sexbot. But Meta’s handling of Galactica shows a similar lack of foresight. “Big tech companies keep doing this—and mark my words, they will not stop—because they can,” says Shah. “And they feel like they must—otherwise someone else might. They think that this is the future of information access, even if nobody asked for that future.”