GPT-5: More Than a Chatbot, A Quick Review

Today OpenAI released GPT-5. First off, I want to recommend the Prompting Guide that OpenAI wrote for it. Unlike a typical model card, this guide dives into the technical details of how to use it effectively, targeting API users rather than researchers. It's especially valuable for engineers building products with the API.

After extensive research, experimentation, and coding, my biggest takeaway is that GPT-5 might not be a major model upgrade, but it is a significant product upgrade. Many of its features are specifically designed to help API users build better products.

The most crucial point is that GPT-5 is far more controllable than its competitors and previous OpenAI models. Let me give you a few examples.

First, as I've mentioned before, every model has its own personality. Take o3, for example. It excels at using search engines, often going through three or four iterations with a dozen keywords before answering a user's question, which results in high factual accuracy. However, it lacks depth in its reasoning, frequently just providing a laundry-list-style answer. In contrast, Gemini 2.5 Pro demonstrates much deeper thinking but hates using search engines. No matter how much I prompted it, used few-shot learning, or practically begged it to search, it would at most use a couple of keywords superficially and then respond based on potentially incorrect information. After many experiments, I couldn't change their styles. I ended up creating a hybrid solution: using o3 for searching and then feeding the results into Gemini for analysis, which actually worked quite well.

Against this backdrop, the GPT-5 API introduces two parameters: reasoning_effort to control the depth of reasoning, and verbosity to control the length of the response. The reasoning_effort parameter has been around since o1, but only with GPT-5 can it control tool usage. With the same prompt, setting reasoning_effort to high might make it search extensively for 5 minutes and process 450k tokens of information before answering. Set it to low, and it might do a quick 15-second search, look at 20k tokens, and return. With the new minimal setting, it won't search at all. verbosity is also highly effective. For the same prompt, a high setting can produce a detailed 3000-token analysis, while a low setting gives a concise 260-token response. From this perspective, o3 acts like reasoning_effort=medium and verbosity=medium. Claude 4's terse, list-based style is like a special case of verbosity=low, while Gemini 2.5 Pro is like reasoning_effort=low with verbosity=high. GPT-5 lets you mix and match these, which is incredibly helpful for product development, allowing us to make flexible trade-offs based on the use case. This also seems to fulfill OpenAI's goal of unifying various models.

A second example of controllability is in managing output formats. If you've ever tried to pipe an LLM's output directly into another tool, you know how difficult it can be. For instance, say I want the LLM to generate a Python script that I can execute directly. If you just prompt it to output Python, it sometimes works, but other times it wraps the code in a markdown block... which obviously isn't runnable by the Python interpreter. JSON mode doesn't solve this either, as it will produce a valid JSON object that contains the markdown code block. If you ask the LLM to avoid markdown, you might get something like python -c '(your code)' instead. There are just so many weird corner cases that are a pain to handle one by one. But with GPT-5, you don't need any special tricks—no prompt engineering, no few-shot learning, no exception handling. You just tell it to output a valid Python program, and it works. It's incredibly reliable and takes controllability to a new level.

This capability opens up a host of new use cases. As The Art of UNIX Programming states, the fundamental design philosophy of UNIX is using pipes and plain text as a protocol, with each tool doing one thing well. Now, LLMs can seamlessly integrate into this ecosystem, becoming powerful new tools and embedding within the existing UNIX environment. Building CLI tools will become much easier.

The third area is state updates. If you've used tools like Cursor or Claude Code, you've probably seen them generate a to-do list and check off items as they work, which creates a great user experience. Implementing this yourself, however, can be quite cumbersome. The GPT-5 API now supports this natively with a feature called tool_preambles. If you're curious, check out the Cookbook I linked to at the beginning.

These API-level changes might be one of the reasons OpenAI decided to make GPT-5 free for all users. For example, free users might get settings like reasoning_effort=low and verbosity=medium, so the model's "laziness" isn't obvious. Premium users, on the other hand, could get access to better parameter settings. On the backend, OpenAI can dynamically schedule more or less capable models and allocate reasoning capabilities more efficiently, deciding which model to invoke based on the user's query. All in all, it provides a lot more flexibility.

I haven't seen these insights about the API and product-focused improvements in popular reviews yet, so I wanted to share them here.

Computing Life

GPT-5: More Than a Chatbot, A Quick Review

Comments

Categories

Tags

Recent Comments