I put OpenAI’s o1-preview through my 4 AI coding tests. It surprised me (in a good way)

sankai/Getty Images

Usually, when a software company pushes out a major new release in May, they don’t try to top it with another major new release four months later. But there’s nothing usual about the pace of innovation in the AI business.

Also: 6 ways to write better ChatGPT prompts – and get the results you want faster

Although OpenAI dropped its new omni-powerful GPT-4o model in mid-May, the company has been busy. As far back as last November, Reuters published a rumor that OpenAI was working on a next-generation language model, then known as Q*. They doubled down on that report in May, stating that Q* was being worked on under the code name of Strawberry.

Strawberry, as it turns out, is actually a model called o1-preview, which is available now as an option to ChatGPT Plus subscribers. You can choose the model from the selection dropdown:

menu — Screenshot by David Gewirtz/ZDNET

As you might imagine, if there’s a new ChatGPT model available, I’m going to put it through its paces. And that’s what I’m doing here.

Also: How ChatGPT scanned 170k lines of code in seconds and saved me hours of work

The new Strawberry model focuses on reasoning, breaking down prompts and problems into steps. OpenAI showcases this approach through a reasoning summary that can be displayed before each answer.

When o1-preview is asked a question, it does some thinking and then displays how long it took to do that thinking. If you toggle the dropdown, you’ll see some reasoning. Here’s an example from one of my coding tests:

–>

Screenshot by David Gewirtz/ZDNET

It’s good that the AI knew enough to add error handling, but I find it interesting that o1-preview categorizes that step under “Regulatory compliance”.

I also discovered the o1-preview model provides more exposition after the code. In my first test, which created a WordPress plugin, the model provided explanations of the header, class structure, admin menu, admin page, logic, security measures, compatibility, installation instructions, operating instructions, and even test data. That’s a lot more information than was provided by previous models.

Also: The best AI for coding in 2024 (and what not to use)

But really, the proof is in the pudding. Let’s put this new model through our standard tests and see how well it works.

<!–>

1. Writing a WordPress plugin

This straightforward coding test requires knowledge of the PHP programming language and the WordPress framework. The challenge asks the AI to write both interface code and functional logic, with the twist being that instead of removing duplicate entries, it has to separate the duplicate entries, so they’re not next to each other.

The o1-preview model excelled. It presented the UI first as just the entry field:

output-data — Screenshot by David Gewirtz/ZDNET

csharp — Screenshot by David Gewirtz/ZDNET

I put OpenAI’s o1-preview through my 4 AI coding tests. It surprised me (in a good way)

1. Writing a WordPress plugin

2. Rewriting a string function

3. Finding an annoying bug

4. Writing a script

A very chatty chatbot

Artificial Intelligence

The Linux file system structure explained

The best iOS 18 features that will make updating your iPhone worthwhile

ITALIAN LANGUAGE

ENGLISH LANGUAGE