On Wednesday, OpenAI unveiled two new models for artificial intelligence (AI). The company’s most recent reasoning-focused models with visual chain-of-thought (CoT) are called o3 and o4-mini. According to the San Francisco-based AI company, these models have the ability to “think” and analyze images in order to respond to increasingly complicated customer inquiries. These models, which are the successor to the o1 and o3-mini, are presently accessible to ChatGPT’s premium users. Notably, earlier this week the business also unveiled the GPT-4.1 family of AI models.
OpenAI’s New Reasoning Models Arrive With Improved Performance
The latest large language models (LLMs) were released by OpenAI’s official handle on X, formerly known as Twitter. The AI company referred to these models as the “smartest and most capable models” and noted that they now have the ability to reason visually.
In essence, visual reasoning means that these AI models are better able to analyze images and extract implicit and contextual information from them. These are the company’s first models that can use and combine all of ChatGPT’s tools agentically, according to OpenAI’s website. These consist of image analysis, file interpretation, web search, Python, and image creation.
This implies that the o3 and o4-mini AI models are able to search for the image online, alter it by flipping, cropping, zooming, and enhancing it, and even execute a Python code to retrieve data. According to OpenAI, this would enable the models to extract information from photos that are not perfect.
These models are now capable of reading handwriting from an upside-down notebook, reading a far sign with hardly readable lettering, identifying a specific query from a long list, determining a bus schedule from a bus image, solving puzzles, and more.
The o3 and o4-mini AI models, according to OpenAI, perform better than the GPT-4o and o1 models on the MMMU, MathVista, VLMs are blind, and CharXiv benchmarks. The business did not provide any performance comparisons with AI models from outside sources.
OpenAI also pointed out a number of these models’ drawbacks. Overly lengthy thinking chains could result from the AI models doing pointless image editing processes and tool calls. Additionally prone to perception problems, the o3 and o4-mini may provide inaccurate answers by misinterpreting visual cues. The AI company also pointed out that there may be reliability-related problems with the models.
ChatGPT Plus, Pro, and Team users will have access to both o3 and o4-mini AI models. In the model selector, they will take the place of the o1, o3-mini, and o3-mini-high models. Next week, these will be made available to Enterprise and Edu users. The Chat Completions and Responses application programming interfaces (APIs) provide developers with access to the models.