LLaMA-13B reportedly outperforms ChatGPT-like tech despite being 10x smaller.
In spite of being “10x smaller,” the new LLaMA-13B large language model (LLM) driven by AI, according to Meta, can outperform OpenAI’s GPT-3 model. Language assistants in the ChatGPT type could be run locally on computers and smartphones thanks to smaller AI models. It belongs to a brand-new group of language models known as “Large Language Model Meta AI,” or LLAMA.
The LLaMA collection of language models ranges from 7 billion to 65 billion parameters in size. By comparison, OpenAI’s GPT-3 model—the foundational model behind ChatGPT—has 175 billion parameters.
As Common Crawl, Wikipedia, and C4 were used to train Meta’s LLaMA models, the company may be able to make the model and its weights available as open source. That represents a significant shift in a field where Big Tech competitors in the AI race have traditionally kept their most potent AI technology to themselves.
While most existing models rely on data that is either not publicly available or undocumented, “we solely use datasets publicly available, making our work compatible with open-sourcing and replicable, unlike Chinchilla, PaLM, or GPT-3,” tweeted team member Guillaume Lample.
Meta calls its LLaMA models “foundational models,” which means the firm intends the models to form the basis of future, more-refined AI models built off the technology, similar to how OpenAI built ChatGPT from a foundation of GPT-3. The company hopes that LLaMA will be useful in natural language research and potentially power applications such as “question answering, natural language understanding or reading comprehension, understanding capabilities and limitations of current language models.”
While the top-of-the-line LLaMA model (LLaMA-65B, with 65 billion parameters) competes for head-to-head with comparable products from rival AI labs DeepMind, Google, and OpenAI, the LLaMA-13B model, which can reportedly outperform GPT-3 while running on a single GPU, is arguably the most intriguing development. LLaMA-13B opens the door for ChatGPT-like performance on consumer-level hardware in the near future, unlike the data center requirements for GPT-3 derivatives.
In AI, parameter size is crucial. A parameter is a variable that a machine-learning model employs to generate hypotheses or categorize data as input. The size of a language model’s parameter set significantly affects how well it performs, with larger models typically able to handle more challenging tasks and generate output that is more coherent. Nevertheless, more parameters take up more room and use more computational power to operate. A model is significantly more efficient if it can provide the same outcomes as another model with fewer parameters.
“I’m now thinking that we will be running language models with a sizable portion of the capabilities of ChatGPT on our own (top of the range) mobile phones and laptops within a year or two,” wrote independent AI researcher Simon Willison in a Mastodon thread analyzing the impact of Meta’s new AI models.
A simplified version of LLaMA is now accessible on GitHub. The whole code and weights (the “learned” training data in a neural network) can be obtained by filling out a form provided by Meta. A wider release of the model and weights has not yet been mentioned by Meta.
Does Meta use artificial intelligence?
Meta’s software is made for AI work called inference, which is when machine learning algorithms that previously have been trained on huge amounts of data are called on to make quick judgments, such as deciding whether a photograph is of a cat or a dog. “This is a software effort that is multi-platform.
Does Facebook use AI?
Artificial intelligence (AI) technology is central to our content review process. AI can detect and remove content that goes against our Community Standards before anyone reports it. Other times, our technology sends content to human review teams to take a closer look and make a decision on it.
What is LLaMA Meta?
Meta’s LLaMA, short for Large Language Model Meta AI, will be available under a non-commercial license to researchers and entities affiliated with the government, civil society, and academia, it said in a blog. Large language models mine vast amounts of text in order to summarize information and generate content