GPT-4 is bigger and better than ChatGPT
The business’s last unexpected smash, ChatGPT, was always going to be a difficult act to follow, but the San Francisco-based startup has made GPT-4 even larger and better.
GPT-4 is a large multimodal language model that can respond to both text and images. Give it a photo of your fridge contents and tell it what you could make, and GPT-4 will try to come up with recipes that use the pictured ingredients.
OpenAI won’t reveal how much bigger it is or why it is superior. GPT-4 is the company’s most clandestine release yet, and it marks the company’s complete shift from a nonprofit research facility to a for-profit technology enterprise.
“That’s something we can’t really comment on right now,” said OpenAI’s chief scientist, Ilya Sutskever, in a video call with the GPT-4 team an hour after the announcement. “There’s a lot of competition out there.”
Users who sign up for the waitlist and subscribers to the premium paid-for ChatGPT Plus service will have limited text-only access to GPT-4.
GPT-4 is a large multimodal language model that can respond to both text and images. Give it a photo of your fridge contents and tell it what you could make, and GPT-4 will try to come up with recipes that use the pictured ingredients.
“The continued improvements across many dimensions are remarkable,” says Allen Institute for AI’s Oren Etzioni. “GPT-4 is now the benchmark against which all foundation models will be measured.”
“For the past couple of years, a good multimodal model has been the holy grail of many big tech labs,” says Thomas Wolf, cofounder of Hugging Face, the AI startup behind the open-source large language model BLOOM. “However, it has remained elusive.”
Combining text and images, in theory, could help multimodal models better understand the world. “It might be able to address traditional language model weaknesses, such as spatial reasoning,” Wolf says.
It’s uncertain whether this applies to GPT-4. On certain simple reasoning tasks, such as summarising blocks of text in terms that begin with the same letter, OpenAI’s new model looks to outperform ChatGPT. In my demonstration, I saw GPT-4 summarise the announcement blurb from OpenAI’s website using terms beginning with g: “GPT-4, a game-changing generational progression, receives higher marks. Guardrails, direction, and gains obtained. Significant, game-changing, and internationally talented.” GPT-4 reviewed a tax document and answered questions about it, offering to reason for its replies in another presentation.
It also outperforms ChatGPT on human tests such as the Uniform Bar Exam (where GPT-4 ranks in the top 90 percentile while ChatGPT ranks in the bottom ten) and the Biology Olympiad (where GPT-4 ranks in the 99th percentile and ChatGPT ranks in the 31st). “It’s exciting to see how evaluation is now being conducted using the same benchmarks that humans use for themselves,” Wolf says. However, he adds that without seeing the technical details, it’s difficult to judge how impressive these results are.
GPT-4 outperforms ChatGPT, which was based on a version of the company’s previous technology, GPT-3, according to OpenAI, because it is a larger model with more parameters (the values in a neural network that get tweaked during training). This follows a significant trend discovered by the company with its previous models. GPT-3 outperformed GPT-2 because it had 175 billion parameters compared to 1.5 billion in GPT-2. a great deal of the time, and a great deal of the time, and a little bit of the time, and a little bit of the time. if you want to do it, you will do it.
However, OpenAI has chosen not to reveal the size of GPT-4. In contrast to previous releases, the company is not disclosing any information about how GPT-4 was built—not the data, the amount of computing power, or the training techniques. “OpenAI is now a fully closed company with scientific communication akin to product press releases,” Wolf says.
According to OpenAI, it spent six months making GPT-4 safer and more accurate. GPT-4, according to the company, is 82% less likely than GPT-3.5 to respond to requests for content that OpenAI does not allow, and 60% less likely to invent content.
OpenAI, achieved these results using the same method it used with ChatGPT: reinforcement learning via human feedback. This involves asking human raters to score different responses from the model and using those scores to improve future output.
The team even used GPT-4 to improve itself, instructing it to generate inputs that resulted in biased, inaccurate, or offensive responses and then modifying the model to refuse such inputs in the future.
GPT-4 could be the best multimodal large language model ever created. However, it is not in a class by itself, as GPT-3 was when it first appeared in 2020. In the last three years, a lot has happened. GPT-4 now coexists with other multimodal models, such as DeepMind’s Flamingo. According to Wolf, Hugging Face is working on an open-source multimodal model that will be free for others to use and adapt.
With so much competition, OpenAI views this release as more of a product tease than a research update. Early versions of GPT-4 were shared with some of OpenAI’s partners, including Microsoft, which confirmed today that a version of GPT-4 was used to build Bing Chat. OpenAI is now collaborating with companies such as Stripe, Duolingo, Morgan Stanley, and the Icelandic government (which is using GPT-4 to help preserve the Icelandic language), among others.
“The costs to bootstrap a model of this scale are out of reach for most companies, but the approach taken by OpenAI has made large language models very accessible to startups,” says Sheila Gulati, cofounder of the investment firm Tola Capital. “On top of GPT-4, this will catalyze tremendous innovation.”
Large language models, however, are essentially faulty. GPT-4 can still produce biased, misleading, and nasty content, and it can be hacked to bypass its safeguards. Despite the fact that OpenAI has improved this technology, it is far from flawless. The business claims that its safety testing is adequate for GPT-4 to be utilized in third-party apps. It is also prepared for unexpected events.
“Safety is not a binary thing; it is a process,” adds Sutskever. “As you attain a new level of capability, things get more difficult. Many of these capabilities are already widely recognized, but I’m sure some will continue to surprise.”
Sutskever even believes that going slower with releases may be beneficial in some cases: “It would be really desirable to end up in a world where firms come up with some type of procedure that allows for delayed releases of models with these utterly unparalleled capabilities.”