Operator isn’t worth its $200-per-month ChatGPT Pro subscription yet – here’s why
ZDNETThis week, OpenAI is introducing a research preview called Operator. I initially wanted to do a hands-on, but once I found out that you need a Pro account (which costs $200 per month), I decided to watch the various OpenAI demos, share them with you, and then share my thoughts. Altman did say that users of the $20-per-month Plus plan would eventually be able to use Operator.Operator is an AI agent. Fundamentally, it simulates keyboard and mouse clicks in a browser, reading the screen, and performing actions.Also: Have a genealogy mystery? How I used AI to solve a family puzzleI have a fairly long history of building this kind of app, using mostly algorithmic programming along with a little machine learning to identify the location of certain images on the screen.My most recent project was an auto-posting tool that would make my social media posts for me. Yes, there are a plethora of subscription services that will do that for you, but I decided to see what it would take to build my own.My code used a combination of the DOM (document object model) for individual social media service pages, along with image recognizers that were able to find buttons (like the + or Post buttons). I used the tool I built for about a year but ran into a very annoying snag.About every two weeks, one of the six sites I was navigating made a small change to the screen interface, which proceeded to break my code. So every two weeks, instead of posting my social media posts normally, I had to spend a few hours fixing whatever had broken.The fact that the web is constantly changing (for example, a blue “Post” button might turn into a red “Post / Subscribe at 30% off” button during a promotion) might knock the AI off its game. Computer-using agent The model OpenAI is using is called CUA, or computing-using agent. This model dictates how Operator talks to the websites it’s supposed to navigate.[embedded content]In their introduction video, Sam Altman and OpenAI team members Yash Kumar, Casey Chu, and Reiichiro Nakano explained that Operator doesn’t use APIs and isn’t working off of extracted text pulled from the DOM. Instead, it’s “viewing” an actual web page in a live browser running in the cloud, reading the context directly off the screen.Also: How ChatGPT scanned 170k lines of code in seconds, saving me hours of workThey were very clear that the control mechanism for the web pages was mouse and keyboard simulation, and the input that the AI reads is the visual representation of the actual web page that we see as humans.The OpenAI team did say that Operator will work just like a human using a web browser — searching, clicking, and visiting websites. But there is a contradiction that I haven’t fully figured out yet, which is that OpenAI has partnered with a bunch of sites (Instacart, DoorDash, Etsy, OpenTable, Tripadvisor, AP, Priceline, StubHub, Thumbtack, Target, Uber, and more). More