Elyse Betters Picaro / ZDNETAs part of my AI coding evaluations, I run a standardized series of four programming tests against each AI. These tests are designed to determine how well a given AI can help you program. This is kind of useful, especially if you’re counting on the AI to help you produce code. The last thing you want is for an AI helper to introduce more bugs into your work output, right?Also: The best AI for coding (and what not to use)Some time ago, a reader reached out to me and asked why I keep using the same tests. He reasoned that the AIs might succeed if they were given different challenges. This is a fair question, but my answer is also fair. These are super-simple tests. I’m using PHP and JavaScript, which are not exactly challenging languages, and I’m running some scripting queries through the AIs. By using exactly the same tests, we’re able to compare performance directly. One is a request to write a simple WordPress plugin, one is to rewrite a string function, one asks for help finding a bug I originally had difficulty finding on my own, and the final one uses a few programming tools to get data back from Chrome. But it’s also like teaching someone to drive. If they can’t get out of the driveway, you’re not going to set them loose in a fast car on a crowded highway. To date, only ChatGPT’s GPT-4 (and above) LLM has passed them all. Yes, Perplexity Pro also passed all the tests, but that’s because Perplexity Pro runs the GPT-4 series LLM. Oddly enough, Microsoft Copilot, which also runs ChatGPT’s LLM, failed all the tests. Also: How I test an AI chatbot’s coding ability – and you can, tooGoogle’s Gemini didn’t do much better. When I tested Bard (the early name for Gemini), it failed most of the tests (twice). Last year, when I ran the $20-per-month Gemini Advanced through my tests, it failed three of the four tests. But now, Google is back with Gemini Pro 2.5. What caught our eyes here at ZDNET was that Gemini Pro 2.5 is available for free, to everyone. No $20 per month surcharge. While Google was clear that the free access was subject to rate limits, I don’t think any of us realized it would throttle us after two prompts, which is what happened to me during testing. It’s possible that Gemini Pro 2.5 is not counting prompt requests for rate limiting but basing its throttling on the scope of the work being requested. My first two prompts asked Gemini Pro 2.5 to write a full WordPress plugin and fix some code, so I may have used up the limits faster than you would if you used it to ask a simple question. Even so, it took me a few days to run these tests. To my considerable surprise, it was very much worth the wait. Test 1: Write a simple WordPress pluginWow. Well, this is certainly a far cry from how Bard failed twice and Gemini Advanced failed back in February 2024. Quite simply, Gemini Pro 2.5 aced this test right out of the gate. Also: I asked ChatGPT to write a WordPress plugin I needed. It did it in less than 5 minutesThe challenge is to write a simple WordPress plugin that provides a simple user interface. It randomizes the input lines and distributes (not removes) duplicates so they’re not next to each other. Last time, Gemini Advanced did not write a back-end dashboard interface but instead required a shortcode that needed to be placed in the body text of a public-facing page. Gemini Advanced did create a basic user interface, but that time clicking the button resulted in no action whatsoever. I gave it a few alternative prompts, and it still failed. But this time, Gemini Pro 2.5 gave me a solid UI, and the code actually ran and did what it was supposed to. More