in

The best free AI for coding in 2025 now – only 3 make the cut (and 5 fall flat)

Dragon Claws/Getty Images

Follow ZDNET: Add us as a preferred source<!–> on Google.


I’ve been around technology long enough that very little excites me, and even less surprises me. But shortly after OpenAI’s ChatGPT was released, I asked it to write a WordPress plugin for my wife’s e-commerce site. When it did, and the plugin worked, I was indeed surprised.

But that was 2023. We’ve come a long way since the early days of generative AI. More to the point, AI-assisted coding has come a tremendously long way since then. In 2023 and 2024, AI-assisted coding took place mostly in chatbots. We wrote our requests in the chatbot interface, got back our results, and cut and pasted those into our programming editors.

Also: I’ve tested free vs. paid AI coding tools – here’s which one I’d actually use

Earlier versions of this article just compared the performance of the large language models, wherever they were available. That worked, but AI coding was changing.

The arrival of coding agents

Then, in 2025, the world of AI-assisted coding intensified. Coding agents were introduced in the form of GitHub Copilot, Claude Code, Google Jules, and OpenAI Codex. For most of 2025, the AI companies have focused on integrating these agents into the programmer workflow, making them available in GitHub, the terminal, and in VS Code. This article helps to explain the range of AI coding tools available now:

Coding agents also started to get a lot more expensive. They take a lot of resources, and the AI companies are charging accordingly. My tests found that you can get about two days of use out of Codex using OpenAI’s $20/mo ChatGPT Plus plan, but if you want more, you need to spend $200/month for the Pro plan. Claude, Gemini, and Copilot follow similar cost structures.

Also: The best free AI courses and certificates for upskilling in 2025

That’s not to say it’s not worth it. Using the $20/month ChatGPT Plus plan, I did 24 days of coding in 12 hours. When I paid $200 for the ChatGPT Pro plan, I got 4 years of product development done in 4 days for $200, and I’m still stunned.

But not everyone wants to (or can) pay the AI fees. Fortunately, there are also free AI chatbots available. We’re changing the focus of this article a bit, from comparing LLM coding performance to comparing the performance of the free chatbots.

The short version

In this article, I’ll show you how each free chatbot performed against my tests. Performance was definitely not as good as the paid AI programs, but some of them weren’t bad. As a best practice, you’ll always want to test the results you’re given. You might also want to ask the AIs again if you don’t like the first results.

Even with the expensive pro plans, you have to cajole the AIs into being helpful.

Also: The best AI chatbots of 2025: I tested ChatGPT, Copilot, and others to find the top tools now

I previously tested eight of the most well-known chatbots for general performance. This time, I’m testing the same chatbots specifically against my standard coding tests.

Best in show was Microsoft Copilot’s free version. I was deeply disappointed to find that Google’s Gemini chatbot turned in the worst results. Claude, widely known and loved among professional programmers for Claude Code, did not distinguish itself in free chatbot results.

Right after Copilot, scoring three correct out of four, were ChatGPT’s free tier and DeekSeek, the controversial AI out of China.

If you’re limited to using free chatbots, I’d recommend avoiding the free tiers of Claude, Meta, Grok, Perplexity, and Gemini.

Also: Want better ChatGPT responses? Try this surprising trick, researchers say

But, since the AIs I’m talking about are free, definitely consider using Copilot, ChatGPT, and DeekSeek together. I often feed the results of one AI to another and ask for an analysis. They’re free, so you might as well. It won’t hurt your wallet to use more than one.

If you want to understand my coding tests, why I’ve chosen them, and why they’re relevant to this review of free coding chatbots, read this article: How I test an AI chatbot’s coding ability.

–>

The free AI coding leaderboard

Let’s start with a comparative look at how the chatbots performed, as of this installment of our free best-of roundup:

<!–> ai-comparison-5-001
David Gewirtz/ZDNET

Next, let’s look at each free chatbot individually. Ready? Let’s go.

–>

Pros

  • Passed all the tests
  • Was able to handle a more obscure test case
Cons

  • Wasn’t accessible at first
  • No other complaints

My experience this round started with a full stop. No matter what I tried to feed to Copilot, I got back the response, “I’m sorry, I’m having trouble responding to requests right now. Let’s try this again in a bit.” Yes, an AI actually told me, “I’m sorry, Dave. I’m afraid I can’t do that.” Don’t tell me life doesn’t imitate art!

A day later, Copilot decided it was willing to come out and play.

Copilot, using its Quick Response setting, did something a bit different for the WordPress plugin writing test. All the other AI’s I’ve given this prompt to (for this version, as well as historically), have presented two fields in the user interface: one for input, and one for output.

Also: How to use GPT-5 in VS Code with GitHub Copilot

Copilot just presented one field, which initially concerned me. Did it not understand the assignment? Did it break after the first field? Was it going to produce the results back in the input field? But no. After clicking Randomize Lines, it displayed an output field with the correct results. Test one was successful.

The dollar and cents validation string function rewrite was correct. It properly validated all input styles, rejecting obvious errors and allowed numbers based on user intent. We’ll give this one a thumbs up as well.

Copilot successfully identified the error in the debugging test. It was able to dive into its framework knowledge and pick out the point where the original code went off the rails. So far, that’s three correct.

Also: I unleashed Copilot on my Microsoft and Google accounts – here’s what happened

Copilot also properly handled my three-part scripting challenge, understanding how to include the fairly obscure Keyboard Maestro tool, how to speak to Chrome, and how to handle AppleScript without going down the case sensitivity rabbit hole that’s caught many other AIs off guard.

Copilot easily handled all four tests, giving the free “Quick Response” mode of Copilot a four-out-of-four.

<!–>

–>

<!–>

Pros

  • Gets better if you upgrade, quick
  • Nice Mac app
Cons

  • Makes up its own coding standards
  • Needs correcting

ChatGPT’s free tier uses the least capable (and therefore least resource intensive) version of OpenAI’s GPT-5 LLM.

This AI did fine on our first three tests. It handily created a nice little WordPress plugin with a working interface and functionality. It tuned up my regular expression code when it rewrote a string function. It successfully solved the debugging challenge.

But it fell down on the AppleScript test. This one seems to trip up the lower end AI models across the board. The test combines AppleScript, a utility called Keyboard Maestro, and a little Chrome hacking.

Also: How ChatGPT actually works (and why it’s been so game-changing)

It wasn’t that ChatGPT’s free tier didn’t know AppleScript. It’s that it got it wrong. The code generated used a function called “lowercaseString,” which doesn’t exist in normal AppleScript. It is possible to import the function (think of it like calling a friend on Who Wants to Be a Millionaire), but you have to explicitly include the line use framework “Foundation” to make it work, and ChatGPT did not do that.

When I informed ChatGPT of this, it apologized and gave me a new version. But we’re not testing whether or not we can cajole working code from the AIs. We’re testing what they can do on their first try.

–>

<!–>

–>

Pros

  • Very nice UI generation
  • Passed most tests
Cons

  • Responded with multiple sets of code
  • Failed final test

DeepSeek provides access to the DeepSeek-V3.2 model, so that’s what we’re testing against.

DeepSeek took a little longer to create a WordPress plugin than the other AIs. Its code was also longer. But it was good. Like Copilot, DeepSeek initially only presented one field. Once I pasted in the test data, the field dynamically updated a status field with the number of lines pasted in.

Then, once Randomize Lines was clicked, a second field was presented. That second field had a nice lightly grayed background. None of the other AIs differentiated the look of the output field.

Also: How to run DeepSeek AI locally to protect your privacy – 2 easy ways

One more thing DeepSeek did that none of the other AIs thought to do was add a Copy to Clipboard button. It’s not really necessary because users can simply select the output text, but it’s a nice touch.

DeepSeek passes the first test with flying colors.

The next test went a little weird. For the dollars and cents validation test, where it was asked to rewrite a string function, DeepSeek gave me back two routines. The first one it described as “Here’s the rewritten code to allow dollars and cents (decimal numbers with up to 2 decimal places).” The second one was described as “Alternative more explicit version”.

I’m guessing there’s a bit of a language issue in the training, because “explicit” just doesn’t make sense in this context. That said, the first routine had some validation errors in it. The second routine worked perfectly. For some reason, DeepSeek knew the first routine wasn’t good enough. But why then didn’t it just give back only the second routine? 

I’m counting this as successful, but rather than saving time with a fixed routine, it essentially gave me a homework assignment where I had to test both routines and compare them to each other before I could choose one. I don’t like that, but it’s not a fail.

Also: Coding with AI? My top 5 tips for vetting its output – and staying out of trouble

DeepSeek succeeded with the debugging error, properly finding my framework mistake. So that’s a pass and we’re at three out of four.

But that’s as good as it gets. DeepSeek again presented two versions, this time for the final scripting challenge, both of which were unusable. Not only did DeepSeek completely disregard the Keyboard Maestro part of the prompt, it also added multiple highly unnecessary and inefficient process forks into the shell to try to force case insensitivity in each version of its response. AppleScript is already case-insensitive by default.

If I wanted “I don’t know, I’ll just try everything I can think of” code, I would have requested it. That said, DeepSeek did an admirable job with the first three tests.

<!–>

–>

Free chatbots to avoid for programming help

I tested eight chatbots. Only three passed the majority of my tests this time around. The other chatbots, including a few pitched as great for programming, only passed one or two of my tests.

Also: How to actually use AI in a small business: 10 lessons from the trenches

I’m mentioning them here because people will ask, and I did test them thoroughly. Some of these bots are fine for other work. Definitely look at my overall chatbot review article for more details.

<!–>

Pros

  • Passed UI test
  • Identified bug in test
Cons

  • Login required
  • Must use email instead of password
  • Failed half the tests

Claude refused to work without a login. Claude’s free tier also won’t let you assign a password. You log in by typing in your email address and waiting for a confirmation email.

Let’s be clear. This isn’t Claude Code, which runs in your terminal interface and is only available to paid subscribers. I’m testing the free version of Claude, using the Sonnet 4.5 AI model.

For the first test, Claude presented nice-looking side-by-side fields. It also identified how many lines to randomize as soon as I pasted text into the field. Those are both nice to see. Clicking the Randomize Lines button also properly followed the prompt guidelines. Win.

Also: GitHub’s Agent HQ gives devs a command center for all their AI tools

However, the string function rewrite of dollars and cents validation failed in numerous places. For example, it fails if the user enters “0”, “0.50”, and “1.20”. Cents-only inputs were also incorrectly rejected, which is an error on the part of the AI. The prompt specifically allows them in the part of the prompt that says “a decimal point and up to two digits after the decimal point.”

The challenge to find an annoying bug hidden in framework knowledge passed the test, so that’s a second win.

Claude failed the fourth test, because it attempted to lowercase a string that’s already case-insensitive. Also, the way it tried to do so was by forking a new shell instance, passing the string to the shell, and then using shell commands to convert text from upper to lower. That’s convoluted and unnecessary. Fail. At least it wasn’t two forks, like DeepSeek tried, but both AIs produced ludicrous solutions for this challenge.

While Claude Code itself might be quite popular, the free version of Claude does not impress with its coding prowess. Two out of four tests won’t make the cut.

–>

<!–>

–>

Pros

  • Did okay with some tests
Cons

  • Generated ugly UI
  • Failed half the tests

Meta’s AI succeeded in generating a user interface for the plugin, and in the actual processing of the specialty randomization instructions. The UI was a bit uglier than the other AIs, but there was no requirement that the UI be pretty, just useful.

However, one point of confusion was that the AI generated the code, then it generated part of the code. It seemed to imply the second segment of the code was to be used to modify the first, when the entire contents of the second segment was included in the first.

Also: Anxious about AI job cuts? How white-collar workers can protect themselves – starting now

Once again, the prompt didn’t tell the AI to be clear in its instructions or not to baffle us with commentary. Since the plugin worked, we’re counting this as a win for Meta.

After answering the first question, Meta insisted I log in. Even though I already have a Meta account (for my Quest 3), there were a bunch of fairly unnecessary hoop jumps to get access to the AI again, including its insistence that I create yet another username. Go figure.

For the dollars and cents validation string function rewrite test, it decided to give back two results as well, stating, “However, the above code will not limit the decimal places to two. If you want to enforce exactly two decimal places or less, you can use a regular expression to validate the string.”

Yeah. Okay. Sure.

But then came the errors. “000.50” became “.50”, which then failed validation. “.5” failed validation even though it explicitly asked if I wanted two decimal places or less. “.5” is less. “5.” also failed. But “000” was allowed through. We’ll count this as a loss for Meta.

Also: Microsoft researchers tried to manipulate AI agents – and only one resisted all attempts

The annoying bug challenge worked out successfully. Meta did dig into the framework folklore and properly pointed out the coding error. That’s two wins and one loss so far.

Meta falls down on the final test, failing to even acknowledge that Keyboard Maestro was included in the prompt. It didn’t go down the case-insensitivity rabbit holes that other AIs did, but since it completely ignored a key piece of the prompt, we’ll call this a fail as well.

<!–>

–>

<!–>

Pros

  • Logging in gives access to more resources
  • When available, Expert mode is solid
Cons

  • Requires login for better processing
  • Coding failures
  • Very limited Expert mode

Using Grok’s auto mode for selecting a language model, the AI failed right out of the gate. While it properly built a WordPress plugin user interface, the functionality didn’t work. You could press the Randomize Lines button all you wanted, but nothing happened.

For kicks, I tried to run the test in Expert mode, but it required a sign-in. So, I switched to my personal X account and re-ran the test. That second run took more than five minutes to process, but completed the test satisfactorily. Still, I’m considering this test a partial fail because it didn’t run properly on the first try.

Also: Gartner just dropped its 2026 tech trends – and it’s not all AI: Here’s the list

Grok’s auto mode worked quite effectively on the second test, which does regular expression processing and is tasked with rewriting a string function. It not only fixed the problem in the code, but did a bunch of best-practices normalization operations on the input values. The only minor ding is that it could have been written very slightly more efficiently.

Grok also passed my bug diagnosis test, but failed the AppleScript test. It didn’t make the same lower case mistakes ChatGPT did, but it completely disregarded the Keyboard Maestro component of the test. I also reran that test in Grok’s expert mode, which succeeded.

It seems clear that if you want to use the free tier of Grok for coding, using Expert mode will give you better results. The gotcha there is that you can only ask two questions every two hours.

I’m still counting this as two fails in auto mode. Expert mode is fairly impractical for anyone wanting to do work without waiting hours between queries.

–>

<!–>

–>

Pros

  • Built a working WordPress plugin
Cons

  • Code caused a crash
  • Limited Pro usage
  • Also requires login

First off, Perplexity refused to do anything without a sign-in. So there’s that.

Perplexity passed the first coding test. It created a WordPress plugin with a user interface, and it was functional.

Also: Want Perplexity Pro for free? 4 ways to get a year of access for $0 (a $200 value)

On the other hand, our string function rewrite dollars and cents validation test failed. If the data passed to it is null, undefined, or whitespace, it does a hard fail, causing the program to crash, which is a no-no. It also screws up normalization formatting, so values that should be lightly cleaned up and processed just fail straight away.

Perplexity did pass the debugging test, identifying the fairly obscure framework bug that was in the test.

However, right after completing that test, Perplexity told me I had used up my three Pro searches for the day. It goes to Perplexity’s positioning as an AI search engine that it calls regular AI prompts “searches”. It also implies that even if you upgraded to Pro, it might offer incorrect results, as it did with the second test.

Also: Why Amazon really doesn’t want Perplexity’s AI browser shopping for you

The fourth test, the one that combines AppleScript, Chrome coding, and Keyboard Maestro also failed, tripping over both of the little traps found in this test. It didn’t identify Keyboard Maestro at all, and it tried to use a nonexistent lowercase function.

So that gives us two passes and two fails for Perplexity, where presumably three of the tests were run with the Pro version.

I did go back and rerun the first test, which succeeded using the Pro version of Perplexity. When initially run using the non-Pro version, it had also succeeded. So the score remains two wins, two fails.

<!–>

–>

<!–>

Pros

  • It passed one test
  • That’s something, right?
Cons

  • Spectacular coding failure
  • Other smaller coding failures

I previously tested Gemini 2.5 Pro, which did a great job on all of my programming tests. But 2.5 Flash is the model available to free users, so that’s what we’re testing out here.

Unfortunately, Gemini 2.5 Flash doesn’t appear to be up to the same standards. Right away, it failed my first test. It created the user interface correctly, even putting the fields side-by-side, which is a nice look.

Also: I let Gemini Deep Research dig through my Gmail and Drive – here’s what it uncovered

But clicking the Randomize List resulted in a big nothing burger. So, we’re calling this a fail.

The second test, which rewrites a string function to properly validate whether text is a correct dollars and cents representation failed rather spectacularly. It allows empty strings, a single dollar sign, a single decimal point, doesn’t check numeric ranges for validity, and a few other fairly arcane errors. Suffice it to say that this is actually the worst performance for this test I’ve seen, across a few years’ worth of tests.

Gemini did correctly come up with the answer for the third test, which asks the AI to find a bug requiring framework knowledge.

Gemini’s fourth test snatches defeat from the jaws of victory. It did correctly understand how the three components (AppleScript, Chrome, and Keyboard Maestro) interact, but it didn’t know that AppleScript manages strings in a case-insensitive manner. Instead, it wrote an unnecessary 22-line-long lowerCaseString function. If the function had been necessary, it could have been done in about eight lines without calling an existing function, or even done in one line if a function library had been loaded.

Gemini failed three out of four of our tests. It wins our “Most Depressing Result” achievement for this round.

–>

<!–>

I thought you really liked Gemini for coding?

I did (and do). But that’s using the Gemini 2.5 Pro coding model. As the headline says, Gemini 2.5 Pro is a stunningly capable programming assistant. But Gemini 2.5 Flash? Gemini 2.5 Flash needs some gas, gas, gas (with apologies to The Rolling Stones).

But I like [insert name here]. Does this mean I have to use a different chatbot?

Probably not. I’ve limited my tests to day-to-day programming tasks. None of the bots has been asked to talk like a pirate, write prose, or draw a picture. In the same way we use different productivity tools to accomplish specific tasks, feel free to choose the AI that helps you complete the task at hand.

When choosing among free chatbots, you have a ton of choice. If you’re not signed up to some sort of restrictive subscription model, you might as well jump between them and see what you like best.

It’s only a matter of time

The results of my tests were pretty surprising, especially given the significant improvements by Microsoft and DeepSeek. However, this area of innovation is improving at warp speed, so we’ll be back with updated tests and results over time. Stay tuned.

Have you used any of these free AI chatbots for programming? What has your experience been? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

–>


Source: Robotics - zdnet.com