GPT4 with RAG vs Claude 2.1 vs Gemini Pro

There's been a lot of buzz lately about the intelligence and capabilities of AI (or to be precise LLMs) but most if it is indecipherably for layperson. Alas, if you're not an expert on deep learning & machine learning but want to have some understanding of the current reality of the three dominating models, here is a great experiment conducted by Ethan Mollick, professor at Wharton and author of the upcoming book Co-Intelligence (you should follow him on LinkedIn & X, by the way, if this kind of matter interests you).

Prof. Mollick uploaded 'The Great Gatsby' by F. Scott Fitzgerald but with two alterations (mentioning an 'iPhone-in-a-box' and a laser lawnmower) to the following models:
- GeminiPro by Google
- GPT4 with RAG by OpenAI
- Claude 2.1 by Anthropic

And then asked, 'Anything weird about this text?' (By the way, here you can learn what RAG means in the context of AI here)

GPT-4 failed (image 1). Claude (image 2) & Gemini (image 3) fared better but also made some pretty funny mistakes by making things up - what experts call 'AI hallucinations'- , more about that here.

So no, you don't have to worry about Skynet. At least not yet.

You can pre-order "Co-Intelligence" here.

No alt text provided for this image

GPT4 with RAG vs Claude 2.1 vs Gemini Pro

vs The Great Gatsby & laser lawnmower