I’ve been around technology for long enough that very little excites me, and even less surprises me. But shortly after Open AI’s ChatGPT was released, I asked it to write a WordPress plugin for my wife’s e-commerce site. When it did, and the plugin worked, I was indeed surprised.
That was the beginning of my deep exploration into chatbots and AI-assisted programming. Since then, I’ve subjected 10 large machine models (LLMs) to four real-world tests.
How to use ChatGPT to write: Resumes | Excel formulas | Essays | Cover letters
Unfortunately, not all chatbots can code alike. It’s been 18 months since that first test, and even now, five of the 10 LLMs I tested can’t create working plugins. Had I chosen one of them instead of ChatGPT, I might have assumed AIs couldn’t code and might have lost interest in AI-enabled programming helpers.
In this article, I’ll show you how each LLM performed against my tests. There are two chatbots I recommend you use, but they cost $20/month. The free versions of the same chatbots do well enough that you could probably get by without paying. But the rest, whether free or paid, are not so great. I won’t risk my programming projects with them or recommend that you do until their performance improves.
Also: How I test an AI chatbot’s coding ability – and you can too
I’ve written a lot about using AIs to help with programming. Unless it’s a small, simple project, like my wife’s plugin, AIs can’t write entire apps or programs. But they excel at writing a few lines and are not bad at fixing code.
Rather than repeat everything I’ve written, go ahead and read this article: How to use ChatGPT to write code: What it can and can’t do for you.
If you want to understand my coding tests, why I’ve chosen them, and why they’re relevant to this review of the 10 LLMs, read this article: How I test an AI chatbot’s coding ability – and you can too.
Once you’ve read those two articles and you’re fully caught up, we can dive into the AIs themselves. Let’s start with a comparative look at how the chatbots performed:
Next, let’s look at each chatbot individually. I’ll discuss nine chatbots, even though the above chart shows 10 LLMs. The results for GPT-4 and GPT-4o are both included in ChatGPT Plus. Ready? Let’s go.
- Passed all tests
- Solid coding results
- Mac app
- Hallucinations
- No Windows app yet
- Sometimes uncooperative
- Price: $20/mo
- LLM: GPT-4o, GPT-4, GPT-3.5
- Desktop browser interface: Yes
- Dedicated Mac app: Yes
- Dedicated Windows app: No
- Multi-factor authentication: Yes
- Tests passed: 4 of 4
ChatGPT Plus with GPT-4 and GPT-4o passed all my tests. One of my favorite features is the availability of a dedicated app. When I test web programming, I have my browser set on one thing, my IDE open, and the ChatGPT Mac app running on a separate screen.
Also: I put GPT-4o through my coding tests and it aced them – except for one weird result
In addition, Logitech’s Prompt Builder, which pops up using a mouse button, can be set up to use the upgraded GPT-4o and connect to your OpenAI account, making it a simple thumb-tap to run a prompt, which is very convenient.
The only thing I didn’t like was that one of my GPT-4o tests resulted in a dual-choice answer, and one of those answers was wrong. I’d rather it just gave me the correct answer. Even so, a quick test confirmed which answer would work. But that was a bit annoying. I didn’t have that issue in GPT-4, so for now, that’s the LLM setting I use with ChatGPT when coding.
<!–>
–>
- Multiple LLMs
- Search criteria displayed
- Good sourcing
- Email-only login
- No desktop app
- Price: $20/mo
- LLM: GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, Llama 3.1 405B
- Desktop browser interface: Yes
- Dedicated Mac app: No
- Dedicated Windows app: No
- Multi-factor authentication: No
- Tests passed: 4 of 4
I seriously considered listing Perplexity Pro as the best overall AI chatbot for coding, but one failing kept it out of the top slot: how you log in. Perplexity doesn’t use username/password or passkey, and doesn’t have multi-factor authentication. All it does is email you a login pin. The AI also doesn’t have a separate desktop app, as ChatGPT does for Macs.
What sets Perplexity apart from other tools is that it can run multiple LLMs. While you can’t set an LLM for a given session, you can easily go into the settings and choose the active model.
Also: Can Perplexity Pro help you code? It aced my programming tests – thanks to GPT-4
For programming, you’ll probably want to stick to GPT-4o, because that aced all our tests. But it might be interesting to cross-check code across the different LLMs. For example, if you have GPT-4o write some regular expression code, you might consider switching to a different LLM to see what that LLM thinks of the generated code.
As we’ll see below, most LLMs are unreliable, so don’t take the results as gospel. However, you can use the results to give you more things to check your original code. It’s sort of like an AI-driven code review.
Just don’t forget to switch back to GPT-4o.
–>
<!–>
- Free
- Passed most tests
- Prompt throttling
- Could cut you off in the middle of whatever you’re working on
- Price: Free
- LLM: GPT-4o, GPT-3.5
- Desktop browser interface: Yes
- Dedicated Mac app: Yes
- Dedicated Windows app: No
- Multi-factor authentication: Yes
- Tests passed: 3 of 4 in GPT-3.5 mode
ChatGPT is available to anyone for free. While both the Plus and free versions support GPT-4o, which passed all my programming tests, there are limitations when using the free app.
OpenAI treats free ChatGPT users as if they’re in the cheap seats. If traffic is high or the servers are busy, the free ChatGPT will only make GPT-3.5 available to free users. The tool will only allow you a certain number of queries before it downgrades or shuts you off.
Also: How to use ChatGPT: What you need to know now
I’ve had several occasions when the free version of ChatGPT effectively told me I’d asked too many questions.
ChatGPT is a great tool, as long as you don’t mind getting shut down sometimes. Even GPT-3.5 did better on the tests than all the other chatbots, and the test it failed was for a fairly obscure programming tool produced by a lone programmer in Australia.
So, if budget is important to you and you can wait when cut off, go for ChatGPT free.
<!–>
–>
- Free
- Passed most tests
- Range of research tools
- Limited to GPT-3.5
- Throttles prompt results
- Price: Free
- LLM: GPT-3.5
- Desktop browser interface: Yes
- Dedicated Mac app: No
- Dedicated Windows app: No
- Multi-factor authentication: No
- Tests passed: 3 of 4
I’m threading a pretty fine needle here, but because Perplexity AI’s free version is based on GPT-3.5, the test results were measurably better than the other AI chatbots.
Also: 5 reasons why I prefer Perplexity over every other AI chatbot
From a programming perspective, that’s pretty much the whole story. But from a research and organization perspective, my ZDNET colleague Steven Vaughan-Nichols prefers Perplexity over the other AIs.
He likes how Perplexity provides more complete sources for research questions, how it cites its sources, how it organizes the replies, and how it provides questions to further searches.
So if you’re programming, but also doing other research, consider the free version of Perplexity.
–>
<!–>
Chatbots to avoid for programming help
I tested nine chatbots, and four passed most of my tests. The other chatbots, including a few pitched as great for programming, each only passed one of my tests – and Microsoft’s Copilot didn’t pass any.
I’m mentioning them here because people will ask, and I did test them thoroughly. Some of them do just fine for other work, so I’ll point you to their more general reviews if you’re just curious about how they function.