Elon Musk was an OpenAI investor when the company was founded in 2015, but he has since not only severed ties with the company but also criticized its approach to political correctness and safety. As a result, Musk launched his own AI chatbot, Grok, which just got a pretty big upgrade.
Also: Nvidia will train 100,000 California residents on AI in a first-of-its-kind partnership
On Tuesday, xAI, an AI company founded by Musk, announced the release of an early preview of Grok-2, its frontier large language model (LLM) with advanced chat, coding, and reasoning capabilities. The release also included Grok-2 mini, which, as the name implies, is a lightweight version of Grok-2.
Prior to this release, an early version of the Grok-2 was tested in the Large Model Systems Organization (LMSYS) Chatbot Arena under the anonymous name “sus-column-r,” a practice many AI companies do before launching a new model.
On this crowdsourced platform, users can evaluate LLMs by chatting with two models side by side and comparing their responses without knowing the models’ names, so the results truly show how capable they are. When pitted against industry-leading models such as OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro, Grok-2 held its own, placing third in the “Overall” category and tying with GPT-4o, as seen below.
If you, like myself, visited the Chatbot Arena leaderboard and were surprised not to see the same results, the LMSYS disclosed that it posts early results on Twitter (X), with “The official update for Grok 2 coming soon..!”
Also: AI risks are everywhere – and now MIT is adding them all to one database
Some other noteworthy Chatbot Arena results include Grok-2’s proficiency in the math and coding categories, in which it placed second in both, and Hard Prompts, in which it placed fourth. If you want to test it in the Arena, visit the website, click Arena side-by-side, and enter a sample prompt.
The company also evaluated Grok-2’s performance on popular LLM performance benchmarks, including the Massive Multitask Language Understanding (MMLU) and MATH benchmarks. The results were better than its predecessor, Grok 1.5, and competitive with industry-leading models, including GPT-4o, Claude 3 Opus, Llama 3, and more.
Beyond its advanced textual performance, Grok 2 allows users to generate high-quality images through a collaboration with Black Forest Labs’s FLUX.1 image-generating model.
Despite many image generators on the market having strict restrictions against creating images involving public figures such as celebrities and politicians, Grok-2 does not, as many beta testers have already gone wild on the platform, generating images of politicians in provocative situations. Below, I am including one of the less provocative generations.
The images rendered are high-quality and realistic, yet there seems to be no disclosure on the platform that makes it clear that an image was generated, another approach many social media platforms take to keep user safety.
Also: Google’s new Pixel Screenshots may be the feature that finally converts me to use AI
Grok-2 and Grok-2 mini are being rolled out in beta on X to X Premium and Premium+ users. These premium X plans are $8 and $16 per month, respectively, and include other perks such as a blue checkmark, limited or no ads, reply prioritization, ID verification, and more. Both models will be released to developers through a new enterprise API platform later this month.
Artificial Intelligence
<!–>
–>