Sam Altman and deputies at OpenAI discuss the performance of the new o3 model on the ARC-AGI test. OpenAI/ZDNETThe latest large language model from OpenAI isn’t yet in the wild, but we already have some ways to tell what it can and cannot do.The “o3” release from OpenAI was unveiled on Dec. 20 in the form of a video infomercial, which means that most people outside the company have no idea what it really is capable of. (Outside safety testing parties are being given early access.)Also: 15 ways AI saved me time at work in 2024Although the video featured a lot of discussion of various benchmark achievements, the message from OpenAI co-founder and CEO Sam Altman on the video was very brief. His biggest statement, and vague at that, was that o3 “is an incredibly smart model.”ARC-AGI put o3 to the testOpenAI plans to release the “mini” version of o3 toward the end of January and the full version sometime after that, said Altman.One outsider, however, has had the chance to put o3 to the test, in a sense.The test, in this case, is called the “Abstraction and Reasoning Corpus for Artificial General Intelligence,” or ARC-AGI. It is a collection of “challenges for intelligent systems,” a new benchmark. The ARC-AGI is billed as “the only benchmark specifically designed to measure adaptability to novelty.” That means that it is meant to test the acquisition of new skills, not just the use of memorized knowledge.Also: Why ethics is becoming AI’s biggest challengeAGI, artificial general intelligence, is regarded by some in AI as the Holy Grail — the achievement of a level of machine intelligence that could equal or exceed human intelligence. The idea of ARC-AGI is to guide AI toward “more intelligent and more human-like artificial systems.”The o3 model scored 76% accuracy on ARC-AGI in an evaluation formally coordinated by OpenAI and the author of ARC-AGI, François Chollet, a scientist in Google’s artificial intelligence unit.A shift in AI capabilitiesOn the website of ARC-AGI, Chollet wrote this past week that the score of 76% is the first time AI has beaten a human’s score on the exam, as exemplified by the answers of human Mechanical Turk workers who took the test and who, on average, scored just above 75% correct. More