Tech

Top Leading Language Models (LLMs) | GPT-4, ChatGPT, LLaMA, and More

April 4, 2023

2687

Top Leading Language Models (LLMs): The language modeling field has made significant progress since Google’s publication of the Attention is All You Need paper in 2017. This paper introduced the concept of transformers (the “T” in all GPT models), which took the natural language processing world by storm and has been the foundation for virtually every advancement in NLP since then. As of now, this one paper has an impressive 68,147 citations, demonstrating the significant amount of work being done in this area!

The landscape of large language models is rapidly evolving, with various companies racing to release bigger, better, and faster versions of their models. Investors are pouring billions of dollars into NLP companies, with OpenAI alone having raised $11B. Currently, we will be primarily focusing on instruction-following LLMs, also known as foundation models. These are a general-purpose class of LLMs that perform tasks based on your instructions. They are different from task-specific LLMs, which are fine-tuned to perform only one task, such as summarization or translation. To learn more about task-specific models, please read our article on the use cases and real-world applications of LLMs.

Top Leading Language Models

We have listed our best large language models and suggested when to use each based on specific needs such as whether you require an API, tunable features, or a fully hosted solution.

GPT-4
ChatGPT
GPT-3
BLOOM
LaMDA
MT-NLG
LLaMA
Stanford Alpaca
FLAN UL2
GATO
Pathways Language Model (PaLM)
Claude
ChatGLM

GPT-4

OpenAI’s platform, GPT-4 has an unspecified capacity, is not openly available as source code, and can only be accessed via an API.

We recommend GPT-4 as our top choice for a fully hosted LLM that is accessible via an API and requires a subscription to ChatGPT Plus. OpenAI announced this model on March 14, 2023, and it boasts impressive performance on a variety of tasks, including professional medical and law exams. GPT-4 has a larger maximum input length of 32,768 tokens, which is approximately 50 pages of text. However, details about the model architecture and training datasets are still unknown.

ChatGPT

OpenAI’s model, ChatGPT has 20 billion parameters and is not available as open-source code, but can be accessed through an API.

ChatGPT is a text-based language model released by Open AI in November 2022. While GPT-4 outperforms ChatGPT, it is still capable of handling a variety of text-based functions. Its counterpart, InstructGPT, was designed to respond to specific instructions, while ChatGPT is focused on engaging in natural language conversations. OpenAI regularly introduces updates and new features such as ChatGPT plugins. Basic access to ChatGPT is available without a subscription, while a ChatGPT Plus subscription is required for general access during peak times.

GPT-3

OpenAI’s largest LLM, GPT-3 has 175B parameters and API access only and is not open source.

GPT-3, announced in June 2020, is pre-trained on a vast amount of text data and can be fine-tuned for specific tasks to complete text in natural language, showing remarkable few-shot and zero-shot performance in various NLP tasks, including translation, question-answering, and text completion.

BLOOM

BigScience’s model, BLOOM has 176 billion parameters, is available for download, and has a hosted API.

In November 2022, BLOOM (BigScience Large Open-Science Open-Access Multilingual Language Model) was released as a multilingual LLM developed through a collaboration of 1,000 researchers from 70+ countries and 250+ institutions, generating text in 46 natural and 13 programming languages. While similar in scope to GPT-3, BLOOM focuses on interpretability and transparency and can perform general text tasks as an instruction-following model.

LaMDA

Google, 173 billion parameters, Not Open Source, No API or Downloads

LaMDA (Language Model for Dialogue Applications) is a conversational model developed by Google, announced in May 2021, which aims to have more natural and engaging conversations with users by training on dialogue and discerning various subtleties. Unlike other language models, LaMDA can be used in various fields such as customer service, chatbots, and personal assistants. It is built on an earlier Google chatbot, Meena, and the conversational service powered by LaMDA is called BARD, which will be available via API soon.

MT-NLG

MT-NLG is a language of Nvidia and Microsoft, with a whopping 530 billion parameters. It is only accessible through API access, which is limited to specific applications.

In October 2021, Nvidia and Microsoft announced MT-NLG (Megatron-Turing Natural Language Generation), which uses the transformer-based architecture of Megatron to generate contextually relevant and coherent text for various tasks such as reading comprehension, completion prediction, natural language inferences, commonsense reasoning, and word sense disambiguation.

LLaMA

LLaMA developed by Meta AI comes in multiple sizes and is available for download by application.

Meta AI announced the LLaMA model in February 2023, which comes in various sizes, ranging from 7 billion to 65 billion parameters. Meta AI believes that LLaMA will help make the field more accessible by overcoming the computing power challenges associated with training large models. The model follows a similar approach to other LLMs, where it predicts the next word in a sequence to generate text. However, only researchers, government affiliates, and academics can apply for access to the model, and it is not available to the general public.

Stanford Alpaca

Stanford’s Alpaca: 7B parameters, downloadable.

In March 2023, Stanford announced the Alpaca model, which is based on Meta’s LLaMA 7B model and is fine-tuned on over 52,000 instruction-following demonstrations. The model aims to provide an open-source alternative to OpenAI’s GPT-3.5 models for the academic community, and it is designed to be small and inexpensive to reproduce. Alpaca’s license prohibits commercial use, making it an excellent choice for research or personal projects. With techniques like LoRA, the model can be fine-tuned on consumer-grade GPUs and even run on a Raspberry Pi, albeit slowly.

FLAN UL2

Google, 20 billion parameters, downloadable from HuggingFace

Flan-UL2 is an upgraded version of the T5 model, which has been trained using Flan, and is an encoder-decoder model. It performs better than the previous versions of Flan-T5 and has an Apache-2.0 license. The model can be self-hosted or fine-tuned as the usage and training details have been released. In case the 20 billion parameters of Flan-UL2 are excessive, there is the option of considering the previous versions of Flan-T5, which come in five different sizes, and maybe more appropriate for certain needs.

GATO

DeepMind, 1.2 billion parameters, unavailable for use

In May 2022, DeepMind announced Gato, a multimodal model that can perform multiple tasks, including image captioning and controlling a robotic arm. Like GPT-4, it is a generalist model capable of working on text and other modalities such as images and Atari games. However, the model itself has not been released, but an open-source project aims to replicate its capabilities.

Pathways Language Model (PaLM)

Google, 540 billion parameters, available via API

Google announced PaLM (Pathways Language Model) in April 2022, which is built on the Pathways AI architecture, a framework designed to create models that can adapt to a variety of tasks and learn new ones efficiently. With 540 billion parameters, PaLM can perform numerous language-related tasks and has achieved state-of-the-art performance on several of them. One of its unique capabilities is generating explanations for complex scenarios involving multiple logical steps, such as explaining jokes.

Claude

Anthropic, Unknown Size, API Access after application

Anthropic announced Claude in March 2023 as an advanced AI assistant that can perform various natural language processing (NLP) tasks such as summarization, coding, writing, and question-answering. Claude comes in two modes – the full, high-performance model, and the faster but lower-quality Claude Instant. However, there is limited information available about the training process and model architecture of Claude.

ChatGLM

Tsinghua University, 6 billion Parameters, Downloadable

In March 2023, Tsinghua University’s Knowledge Engineering Group (KEG) & Data Mining announced ChatGLM, a language model available in Chinese and English, which can be downloaded from HuggingFace. Despite its size, ChatGLM can be run on consumer-grade GPUs with quantization. It claims to be similar to ChatGPT but optimized for the Chinese language. ChatGLM is one of the few LLMs with an Apache-2.0 license that allows commercial use.

Conclusion

The field of LLMs is advancing rapidly, as evidenced by the frequent announcements of new models with increasing numbers of parameters. However, the true value of these models lies in their applications.

At Vectara, they are utilizing LLMs as a powerful tool, along with NLP prompts, to assist users in analyzing large volumes of their own business data, enabling them to search, find, and uncover insights that were previously hidden.

FAQs on Top Leading Language Models (LLMs)

What are LLMs?

LLMs stand for Language Models. They are powerful AI models that can process and understand natural language input. They are trained on massive amounts of text data and can generate text, translate languages, summarize text, answer questions, and perform many other language-related tasks.

Which are the top leading LLMs?

Some of the top leading LLMs include GPT-4, ChatGPT, GPT-3, BLOOM, LaMDA, MT-NLG, and LLaMA. These models have billions of parameters and have achieved state-of-the-art performance on various language tasks.

What can LLMs be used for?

LLMs can be used for a wide range of language tasks, including text generation, language translation, summarization, question-answering, sentiment analysis, and more.

Can I access LLMs?

Yes, many LLMs are available for download or can be accessed through cloud-based APIs. However, some of the largest models may require specialized hardware to run efficiently.