Documenting some recent product iteration process of a GPT-based voice transcription tool. Previously, while developing this voice input tool, I had already explored the capability boundaries of GPT. I found its intelligence to be quite superficial in many areas. In other words, the answers or conclusions it provides are often relatively shallow, unlike the insightful responses given by experienced individuals. That's why I initially positioned it as a text-processing secretary, only required to restate my views and reorganize them into a structured form.
However, recently I began to experiment with deepening its functionality, allowing it to provide commentary on my ideas beyond text. Specifically, I added a prompt for it to ask an insightful question regarding what I just said, checking for any blind spots in my thinking, and then propose a rebuttal. After a few attempts, I found this approach to be quite enlightening. We often don't need particularly profound answers or opinions because of the Pareto principle, which states that we spend 80% of our time on 20% of the important work, and 20% of our time creates 80% of the value. We can't expect every decision or task to be highly challenging and significant. For most of our work, we don't necessarily need insightful or profound opinions, but rather reminders—even simple ones can be valuable.
From this perspective, I believe GPT is capable of accomplishing all this. In many cases, I can learn something from its responses, allowing my ideas to iterate gradually and become more in-depth and comprehensive. Consequently, after further discussions with my collaborators, we altered the product's form. Previously, it was merely a voice recognition and transcription tool, where input was in the form of audio, and our involvement ended after organizing the input into text. This text would be stored in a knowledge management system and only revisited when writing articles.
Now, we've shifted the basic processing unit from audio to a concrete "idea." First, we capture the idea in the form of a transcription, and after obtaining the GPT reorganized idea, we use prompts to make GPT pose more in-depth questions and critical opinions, deepening our thought process. After iterating on this idea, we can reshape and output it specifically. More precisely, we can use certain prompts to improve its readability, make it appear more sophisticated, or present it in a specific style. This completes the entire process of an idea from capture to iteration to output. Compared to merely using GPT as a transcription tool, this product form has leveled up. I've already experienced its practical benefits in real-life and work situations.
As we mentioned earlier, if knowledge remains solely in our minds without being recorded and placed into a persistent system, its utility becomes greatly limited. That's why I've not only made this tool interact with users through a Telegram platform but also added export functionalities. For example, a script now runs every evening, exporting all the ideas recorded that day to Notion. Because this tool significantly lowers the barrier to capturing thoughts, I'm more inclined to use it for documentation. Today alone, it recorded over 6,600 words. Revisiting these ideas is valuable in itself, not to mention the greater potential for future search and aggregation. This observation confirms our previous notion: GPT and voice transcription tools (now idea iteration tools) can substantially reduce the barrier to acquiring knowledge, increase knowledge acquisition efficiency, and even dismantle old knowledge management systems. Moving forward, I plan to continue iterating on this tool and share new insights after using it for a few more days.
Comments