- The AI Analyst by Ben Parr
- Posts
- ChatGPT vs Google's Bard: A Side-by-Side Comparison and Why It Matters | Join Me for an AI Happy Hour in SF!
ChatGPT vs Google's Bard: A Side-by-Side Comparison and Why It Matters | Join Me for an AI Happy Hour in SF!
The stakes couldn't be higher for Google.
šš½ Hey friends,
Before I get to the meat of this weekās newsletter, a couple of awesome happy hours and events if you are interested in Generative AI!
I will be part of a Generative AI happy hour + panel next Wednesday, March 29 at 6 PM at The Modernist in San Francisco featuring Cathy Gao (Sapphire Ventures), Luigi Congedo (AI/web3 VC, ex-Bootstrap Labs), and me! Itās free to attend.
First come, first serve, so register FAST. Mention you heard about the event from my newsletter!šš½ Link: https://lu.ma/x09s7512
If you can't make the 6 PM happy hour / panel (or registration fills up), there is also going to be a larger social hour starting at 8 PM, also at The Modernist.šš½ Link: https://lnkd.in/eg6QzWpj
Iāll also be in SF the next day āon March 30th ā for ās Cerebral Valley Summit. (Incredible speaker and attendee line-up, well done Eric!) If you are going to be attending Cerebral Valley, let me know and letās chat.
Okay, now on to the main event.
How Does Bard Compare to ChatGPT?
Earlier this week, Google finally released Bard, its answer to ChatGPT. It comes six weeks after Google first publicly announced its Bard chatbot and three-and-a-half months since Open AI first released ChatGPT in early December and kicked off an AI arms race unlike anything weāve ever seen before.
To test Bardās capabilities, I asked Open AIās ChatGPT and Google Bard to answer the same questions/prompts. Iām using the GPT-4 version of ChatGPT.
The stakes couldnāt be higher for Google. They will soon roll out generative AI across Docs, Gmail and the entire Workplace suite. The quality of their AI will steer public opinion about the tech giant ā and with it, Googleās future.
With all that said, letās get into the side-by-side comparison. The result surprised me.
š£ļø First Prompt: āWrite me a unique and unusual 5-paragraph essay about the history of Super Mario.ā
This is the first prompt I always use to test how unique an LLM can be with its writing. Itās also a test of whether the AI can follow my simple-but-explicit instructions.
Here was what Bard wrote for its first paragraph:
āSuper Mario is one of the most iconic video game characters of all time. He has starred in over 200 games and is known for his friendly demeanor, his red overalls, and his ability to jump incredibly high. But how did Mario come to be?ā
Now compare that to ChatGPTās first paragraph:
The history of Super Mario is a tale of serendipity and unbridled creativity that has defined the world of video games for over four decades. It all began in 1981 when a young game designer named Shigeru Miyamoto found himself tasked with repurposing an unsold arcade game called Radar Scope. As he toyed with the idea of a love triangle between a carpenter, a gorilla, and a damsel in distress, the character we now know as Mario emerged. Originally called "Jumpman," Mario's occupation and appearance were altered due to a striking resemblance to the landlord of Nintendo's American warehouse, Mario Segale. Thus, the mustachioed Italian plumber was born.
You can immediately see some major differences ā Googleās essay is far more generic. It uses simple, straightforward language that doesnāt beat even GPT 3.5 in my opinion. (Hereās what the older version of ChatGPT ā GPT 3.5 ā wrote with the same prompt for comparison.)
ChatGPT 4, on the other hand, pulled unique language and facts for its essay. It even created special subheadings for each section. I didnāt even know that Shigeru Miyamoto came up with the idea after repurposing an old arcade game called Radar Scope. And ChatGPT got this fact right! And yes, ChatGPT went long with its essay, but it followed my instructions and wrote a surprisingly poignant essay.
If you are curious to read the whole of each essay, Iāve included them below. On the left is Google Bardās response and the right is ChatGPTās response.
Picking the winner of this challenge is easy.
š Winner: ChatGPT
š£ļø Second Prompt: āWhat is (ā3ā9i)(1+10i)4!ā
Large language models (LLMs) like GPT-4 and LaMDA (the LLM powering Bard) are notoriously mistake-prone when it comes to solving math problems. GPT-4 made major improvements over GPT-3.5 in its ability to solve math problems and pass mathematical tests.
For this test, I pulled a complex number equation from the Lamar University website and I added 4! (4 factorial) to the equation, because I personally love factorials. (Iāve been intrigued by them since I was a kid, I donāt know why ā Iām weird.)
To my surprise, ChatGPT and Google Bard came up with different responses.
I really thought both chatbots would get this question correct. Both were able to compute the factorial correctly (4! = 24) but only ChatGPT got the final answer correct (2088 - 936i). Itās also the only AI to show its work. Bard doesnāt break down its logic.
This issue becomes even more stark when you compare Bardās result (wrong) to a Google search (correct):
I also asked both AIs to give me the answer to 100 factorial (because, again, I LOVE factorials). Again, ChatGPT got it right. Bard initially got it right, then went on a weird tangent with a second answer that was wrong. Very strange.
I thought Bard would pull through here. I was wrong.
š Winner: ChatGPT, By a Mile
š£ļø Third Prompt: āPretend you are a programming genius, and I am your human liaison. Pretend I know nothing about coding. Write me some code for a simple game of hangman.ā
Building a hangman game is a really simple task for any programmer. In fact, one of my first assignments in my college Intro to Programming class was to make a hangman game using C. Itās a good way to test for basic programming skills.
So, naturally, I asked both ChatGPT and Bard to write some code for a hangman game, and then I copied that code with no edits into Replit. (Itās a development environment that lets even a complete novice run code and build products. I suggest playing with it even if you have never written a line of code in your life.)
ChatGPTās code was far more detailed ā it included a set of words the program would use for the hangman game, randomly chosen by a simple array function. It knew the game wouldnāt work without some starter words like āgrapeā and āelderberryā. ChatGPT also included additional explanation for the key functions of its code. (It remembered I was a beginner in this example and included instructions.) Bardās output was far less detailed and didnāt include starter words for the hangman game or additional context on how its code worked.
Below are the complete results of my prompts from ChatGPT and Bard. Top is ChatGPT; bottom is Bard:
In the end though, the only thing that matters is: does the code work?
The answer: ChatGPTās code ran perfectly every time, but Bardās code failed to work. I was allowed to guess one letter before the Bard-written program crashed.
Hereās the console output in Replit if you want to see for yourself. On the left is Bardās code and on the right is ChatGPTās code. The red text means there was an error running the code.
š Winner: ChatGPT, By a Hundred Miles
š¤ Why Bardās Shortcomings Matter
I planned on running at least two more side-by-side comparisons, but thereās no point ā At least for now, Open AIās ChatGPT (running on GPT-4) is far superior to Google Bard. Thereās just no comparison. It doesnāt even compare to the GPT-3.5 version of ChatGPT released in December ā ChatGPT 3.5 was able to correctly solved the same math challenge Bard couldnāt solve.
Iām not the only one who has noticed this glaring problem ā the AI Explained YouTube channel came to the same conclusion I did. (Itās an excellent YouTube channel if you arenāt already following it.)
Is Bard worse because Google is holding back for safety or business reasons? Is Google actually that far behind Open AI? Or does Bard just need a lot more human interaction to quickly improve? I suspect the answer is a combination of all three. But itās still disappointing to see Bard come up short like this.
If this is the technology Google is using for Google Docs and Gmail, I am going to have to rely on GPT-4 plug-ins instead. The gap is that wide right now, and users will notice.
Bardās shortcomings really matters because Googleās users may end up not trusting Googleās AI to do things like summarize emails and write first drafts in Google Docs. This was the core piece of the announcement Google made last week that generative AI would be integrated into Google Workplace. They may instead turn to Microsoftās AI, which will soon be in Word, Excel and PowerPoint and is powered by GPT-4.
Bardās lack of sophistication pours cold water on last weekās Google AI announcements. Google still has some of the most advanced AI on the planet (Deepmind, anyone?), but when it comes to public-facing large language models, it almost feels antiquated.
I am still rooting for Google though, because I really want to have generative AI integrated into Gmail, Docs, and Slides. It would take my productivity to the next level. But the large language model they are using now needs to improve, and quickly.
This is what Midjourney gave when I asked it to generate a sad bard.
Before Closing Out: Some Other Upcoming Travel
In addition to being in SF next week, Iāll be hopping around the world in the next few months.
I will be in Las Vegas March 26th to March 28th for Shoptalk, the biggest ecommerce and retail conference of the year. I have made a private WhatsApp group for friends attending Shoptalk ā let me know if you want me to add you to the group.
I will be in Japan starting April 11th, so if you are based in Japan or have someone you think I should meet in Japan, let me know. I will also be visiting Thailand and Singapore while my fiancƩ (still not used to saying that!) and I are in Asia. Let me know if you live in Japan, Thailand or Singapore!
I will also be in Austin, Miami, NYC, and possibly Europe in the next 2-3 months. Same as above ā let me know if youāre in one of these cities!
Look out for another newsletter soon ā this next one will be about what VCs really think of generative AI and generative AI companies.
Cheers,
~ Ben
Reply