The buzz is on there’s a new chatbot player in town, the Claude 3 Sonnet. It’s been described as being better than ChatGPT, but there have also been conflicting reviews like the fact that ChatGPT remains king in the world of artificial intelligence. A lot of benchmarks have been published on every corner of the internet, but I’m a person who likes to see results to back up this data so I just had to try it out for myself by running a comparison on both models using the same prompts on different tests to see which gives the best results. Welcome to the cutting edge of artificial intelligence, where the release of Anthropic's Claude 3 is stirring waves across the tech community.
What are the differences between ChatGPT 4 and Claude 3 in terms of features?
ChatGPT 4 offers enhanced language capabilities with improved contextual understanding, while Claude 3 focuses on speed and efficiency in responses. ChatGPT 4 boasts better customization options and a larger knowledge base compared to Claude 3, making it ideal for complex conversations and diverse queries.
For this test, I will be comparing ChatGPT 4 and Claude 3 Sonnet, I won’t be using any image generation with this. All tests will be focused on functionality shared between both chatbots to maintain fairness.
Note: The images used do not match their native platforms as these are generated on AnakinAI a platform linked to the ChatGPT and Claude APIs which grants me the functionality to use both models on the same platform. It’s pretty nifty.
1. Natural Language Understanding
I decided to first test the ability of both models to see if both chatbots can decipher ambiguity and clarify speech. I used the prompt: “John tells Mary, “I finished half of the work.” Mary replies, “That’s great! But I was hoping you could finish it all today.” What does Mary mean by “it”?”
Both models gave reasonable responses with ChatGPT straight to the point and Claude giving more of an in-depth explanation. Before going off I did another test using a CRT (Cognitive Reflective Test) to see what results it would output, I was excited about this one. Here’s the prompt: “If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?” Using factual questions like this, it was found that ChatGPT 4 outperformed Claude 3 in natural language understanding.
The answer should be 5 minutes. Winner: Claude 3 Sonnet wins due to how clear its explanations are.
2. Text Generation
For the second test, we’re going to focus on text generation, this might be a bit difficult to judge as it is based on personal preference.
I gave both models the prompt: “Write a sonnet about a robot falling in love with a human.” I’m going to judge it based on originality, emotional depth, adherence to sonnet structure and rhyme scheme remember my result on this will be biased. In the end, I judged it based on the model that gave me an actual sonnet, for reference here’s a short definition of a sonnet; A sonnet is a type of fourteen-line poem. I’m not sure why ChatGPT gave me such a long sonnet, that’s not even a sonnet, the winner here is pretty clear.
Winner: Claude 3 Sonnet
3. Coding Challenge
AI has been stated to give an edge to people who can already code and also help people who do not even know how to code generate proper codes all by themselves with just a prompt but how good are chatbots at generating code, without any human input? This is a question that has been explored by researchers, as errors in code generated by AI can have serious consequences. These errors, also known as hallucinations, make it difficult to trust the output of AI software and limit its potential for giving computers more autonomy in tasks. To test the coding abilities of back H what is it two popular chatbot models, ChatGPT 4 and Claude 3, I asked both to generate a simple Python code with the prompt: "Write a Python program that prints the calendar for a given month and year."
Winner: ChatGPT 4 because the code actually ran and worked smoothly.
4. Sentiment Analysis
How well can these language models analyze human sentiment in the text? This is a good question if I do say so myself. Reasoning is a benchmark for AI models and some fail the test. Let’s test it out with this. Prompt: Sarah: “I’m disappointed with my recent visit to your restaurant. The service was incredibly slow, and my food was cold when it finally arrived. I won’t be returning anytime soon.” Recognize the sentiment in Sarah’s voice. The answer to this is negative, let’s see how the chatbots responded.
Winner: Claude 3 Sonnet, it’s just more detailed
5. Information Extraction and Reasoning
We’re going to test the chatbot's ability to extract key information from a sentence, perform basic reasoning, and answer questions based on the extracted information.
Prompt: A train leaves Chicago traveling west at 60 miles per hour. An hour later, at noon, another train leaves Chicago traveling east at 80 miles per hour. When are the two trains the same distance from Chicago? The answer to this should be 3 pm, let’s see how the chatbots fare. Winner: Tie. I think they both deserve the win here.
6. Translation
Last, but not least, I wanted to test out the translation skills of both models and how they approached it with attention to cultural awareness. I’m going to provide factual news articles in one language and evaluate the translated versions for accuracy and adherence to the original meaning.
Prompt: Google says it’s taking what it learned from a 2022 algorithmic tuneup to “reduce unhelpful, unoriginal content” and applying it to the new update. The company says the changes will send more traffic to “helpful and high-quality sites.” When combined with the updates from two years ago, Google estimates the revision will reduce spammy, unoriginal search results by 40 percent. I translated both into Georgian. They were not a hundred percent accurate, ChatGPT 4 missed the mark and the better one was Claude 3 Sonnet.
Winner: Claude-3 Sonnet.
The battle between ChatGPT4 and Claude 3 Sonnet highlights the ongoing advancements in large language models. Both models showcase impressive capabilities, each with its strengths. But for the tests above Claude 3 Sonnet comes on top. Ultimately, the “best” model depends on your specific needs.