Top Three LLMs Compared: GPT-4 Turbo vs. Claude 3 Opus vs. Gemini 1.5 Pro

GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro are three of the best generative AI large language models (LLMs) Silicon Valley and the tech community have to offer. With a new AI model launched every week or two, Spiceworks News & Insights examines the three top-ranked LLMs by over half a million techies and what makes them the best.

April 18, 2024

best LLM among GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro
  • GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro are three of the best generative AI large language models Silicon Valley and the tech community have to offer.
  • With a new AI model launched every week or two, Spiceworks News & Insights examines the three top-ranked LLMs by over half a million techies and what makes them the best.

In the generative AI (GenAI) era, large language models (LLMs) have emerged as the soul of AI development and deployment. Back in November 2022, OpenAI launched the GenAI race with the launch of ChatGPT, its free GenAI chatbot based on the Generative Pre-Trained Transformer 3.5 (GPT-3.5). Lots of acronyms, we know.

However, the transformer, the deep learning architecture at the heart of LLMs, is a Google creation from 2024. So when OpenAI, with financial and computing resource backup from Microsoft, released ChatGPT, taking the world by storm, Google was caught off-guard by what hit it: that ChatGPT’s human-like responses earned it:

  • One million users in the first five days
  • 100 million monthly users, making it the fastest-growing app/online service
  • Two million developers in a year
  • 180.5 million users in March 2024, i.e., after 16 months of launch
  • 1.8 billion monthly visits within five months of launch (it had 1.67 billion visits as of February 2024)

Google and other AI companies have since made it clear that GenAI is at the top of their mind, having taken strategic measures and dedicated valuable capital. The search and cloud giant released Bard in March 2023 and rebranded it as Gemini just under a year later in February 2024. While Google managed to capture a portion of the market with the Gemini series of LLMs, it never quite has been able to usurp the best of what OpenAI has had to offer—currently, GPT 4 Turbo.

Claude 3 Opus, an LLM from Anthropic, a company backed by Amazon and Google and founded by former OpenAI VPs, did manage to depose GPT-4 Turbo on the Chatbot Arena Leaderboard. But its reign was short for just over two weeks. OpenAI reclaimed the crown with improved math, logical reasoning, coding, and writing skills in GPT-4 Turbo and made it available for paid ChatGPT users.

The Chatbot Arena Leaderboard is a crowdsourced evaluation platform that uses inputs, discussions, and votes from 672,236 (as of April 13, 2024) to compare and rank 82 LLMs. The platform has become the go-to resource for LLM comparison, considering LLM output quantification is mired in ambivalence and is open to question.

Google’s Gemini Pro is one of the top five LLMs, with iterations of GPT -4 Turbo occupying the #3 and #4 positions.

The LLM arena has become somewhat cluttered, with a new LLM launched every week or two. Spiceworks News & Insights examines the three top-ranked LLMs by over half a million techies and what makes them the best.

See More: Malicious Intent: Microsoft and OpenAI Identify APT Groups Weaponizing GenAI LLMs

Head-To-Head: GPT-4 Turbo vs. Claude 3 Opus vs. Google Gemini 1.5 Pro

The comparison of the top three LLMs isn’t as straightforward as comparing operating systems or hardware. Here’s how GPT-4 Turbo compares with Claude 3 Opus and Google Gemini Pro.

AI benchmarks

AI benchmarks offer an interesting perspective on LLM output…on paper.

Each standard is a litmus test designed to assess specific functions, such as reasoning, coding, etc. The problem is that companies can manipulate data or play with a prompt engineering technique to attain target benchmark points.

This directly proves Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

That said, check out where GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro stand when put through some popular benchmarking tests.

Benchmark

Benchmarking Purpose GPT-4 Turbo Claude 3 Opus Gemini 1.5 Pro
MBPPOpens a new window Coding #1 #3

NA (Gemini 1.0 Pro at #23)

SWE-benchOpens a new window

Debugging #1 #2 NA
GPQAOpens a new window Scientific Questioning (Graduate level) #2 #1

NA

Human EvalOpens a new window

Code generation #1 #3 #6
GSM8KOpens a new window Arithmetic Reasoning (Grade School) #1 #9

#37 (Gemini Ultra at #10)

MMLUOpens a new window

Knowledge Test (Elementary mathematics, U.S. history, computer science, law, etc.) #5 #2 #9 (Gemini Ultra at #1)
WinoGrandeOpens a new window Common Sense Reasoning #8 #7

NA

ARCOpens a new window

Common Sense Reasoning #1 NA (Claude 2 at #3) NA
HellaSwagOpens a new window Sentence Completion #4 NA

NA

Launch

OpenAI launched GPT-4 Turbo in November 2023; Google rolled out Gemini 1.5 Pro in February 2024, while Anthropic released Claude 3 Opus in March 2024.

See More: AI Powerhouse on Display: The Best of NVIDIA GTC 2024

Knowledge cutoff

The knowledge cutoff dates for the top three LLMs, i.e., the time until when they have access to information, is as follows:

  • GPT-4 Turbo: December 2023
  • Claude 3 Opus: August 2023
  • Gemini 1.5 Pro: 

Context window

One of GenAI’s defining factors is the context window or the amount of information an LLM can process as input and access immediately in its retrievable short-term memory, allowing richer Needle In A Haystack (NIAH) evaluations. The context window is an important specification in enterprise use, where large documents and datasets are fed for summarization and analysis. Moreover, a large context window indicates higher intelligence and efficiency.

GPT-4 Turbo features a context window of 128,000 tokens, four times bigger than GPT-4’s 32K. This means it can accept and process inputs of approximately 450 book pages. Claude 3 Opus boasts an impressive 200K context window, allowing it to accept inputs of roughly 300 pages or 150,000 words. However, both OpenAI and Anthropic have instituted rate limiters for respective LLMs.

Meanwhile, Google’s Gemini 1.5 Pro has a 128K context window. Interestingly, Google has allowed a limited group of developers and enterprise customers to try out a context window of up to a whopping one million tokens via AI Studio and Vertex AI in private preview.

Google plans to introduce pricing tiers based on the context window. With sheer computing available, is this Google’s way of challenging Microsoft and OpenAI?

Anthropic also offers developers an option to avail one million context window for Claude 3 Opus in specific use cases.

Parameters

AI companies have ceased to disclose the parameter count, the fundamental blocks of LLMs that get adjusted and readjusted as the models are trained. Parameters define the relationships between words, enabling GenAI tools to generate text.

LLMs can contain trillions of parameters. For instance, GPT-4 Turbo is rumored to contain 1.76 trillion parameters, while Claude 3 Opus is believed to have 2 trillion parameters.

See More: Getting Real About Gen AI: What You Need to Know

GenAI IQ tests

While GenAI systems and their neural networks are designed to mimic humans, they have usually fallen short of the intellectual abilities prescribed and measured under IQ tests compared to humans. However, it is important to note that intelligence cannot be viewed as the same construct for humans and LLMs, given that LLMs still cannot reason as well as humans.

While machine intelligence tests for LLMs become mainstream, the application of human IQ tests on GenAI suggests that Claude 3 Opus has topped the average human IQ. Recent IQ tests performed on multiple models revealed Claude 3 Opus has the highest IQ of 101 among the top three ranked LLMs. Comparatively, GPT-4 Turbo has an IQ score of 85, bettering Gemini Advanced’s 76 by 11 points.

Language support

Gemini 1.5 Pro’s support of 38 languages in more than 180 countries eclipses the dozen and 26 that Claude 3 Opus and GPT-4 Turbo support, respectively.

Pricing

GPT-4 Turbo is available for $10 per one million input tokens and $30 per one million output tokens.

The input token pricing of Claude 3 Opus is $15 per one million input tokens (1.5x), and it costs more than twice as much as GPT-4 Turbo for output tokens—$75 per one million output tokens. For personal use, it costs $20 per month.

Google is going even further than OpenAI and allowing access to Gemini 1.5 Pro at $7 per one million input tokens and $21 per one million output tokens (preview pricing) starting May 2, 2024.

Which is your preferred LLM? Share with us on LinkedInOpens a new window , XOpens a new window , or FacebookOpens a new window . We’d love to hear from you!

Image source: Shutterstock

MORE ON ARTIFICIAL INTELLIGENCE

Sumeet Wadhwani
Sumeet Wadhwani

Asst. Editor, Spiceworks Ziff Davis

An earnest copywriter at heart, Sumeet is what you'd call a jack of all trades, rather techs. A self-proclaimed 'half-engineer', he dropped out of Computer Engineering to answer his creative calling pertaining to all things digital. He now writes what techies engineer. As a technology editor and writer for News and Feature articles on Spiceworks (formerly Toolbox), Sumeet covers a broad range of topics from cybersecurity, cloud, AI, emerging tech innovation, hardware, semiconductors, et al. Sumeet compounds his geopolitical interests with cartophilia and antiquarianism, not to mention the economics of current world affairs. He bleeds Blue for Chelsea and Team India! To share quotes or your inputs for stories, please get in touch on sumeet_wadhwani@swzd.com
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.