A Deep Dive into AI Management from a Slacking-Off Session

Although I've been heavily using AI for work (and slacking off) for a long time and believe that AI as a productivity multiplier is a reality, a recent slacking-off achievement reached a new level that I feel is worth sharing.

I'm an Applied Scientist. Today, I had a new modeling idea in the shower. While getting dressed, I dictated my thoughts into my phone for three to four minutes, resulting in a rambling, disorganized Chinese prompt of over a thousand words. I then pasted it into Cursor on my computer. After Cursor rephrased what it was going to do and I confirmed it was correct, I let it start. Then, I went to drop my kid off at school.

When I got back, I found that Cursor had been working for about twenty minutes and was just about finished. In that time, it had implemented my modeling idea, run over a hundred experiments with various configurations, identified the most promising combination, and proceeded with further tuning. It then performed data analysis from various angles and wrote up a document, complete with visualizations for me to review. During its analysis, it found a number that didn't add up, which led it to go back, debug, and fix a bug.

This is actually a substantial amount of work. For instance, if a Senior Scientist told me in a stand-up, "Yesterday, I had a new modeling idea and implemented it. I ran experiments on over a hundred parameter combinations, found the best ones, did a deep dive to understand their pros and cons, and created visualizations. Here's my report, I'm still verifying its correctness and there might be updates," I would consider that a very strong scientist. Business value aside, at least this level of efficiency is "Exceeds Expectations." But all I did was have an idea in the shower, mumble into my phone while getting dressed, and press Enter before leaving to take my kid to school. This completed a full day's work for a Senior Scientist, leaving the rest of the day for slacking. The feeling of this kind of slacking is incredibly satisfying.

But my point isn't "AI is great, everyone should use it to slack off immediately." If you actually try this, you'll find that AI isn't that easy to use. It can be dumb, lazy, and full of pitfalls. However, that's not to say the story above is nonsense—it's a true story. I later used the AI-generated visualizations for a deep dive in a stand-up, cross-checking the correctness with the team (we found no issues), and made some business-related decisions based on these experimental results. The AI's work had real business value. The catch is that using AI effectively has its own learning curve. As we've mentioned many times before, we need to manage AI like we manage people to avoid common pitfalls.

From the perspective of an AI manager, this article aims to share the key steps and management principles (even secrets) for making AI work this smoothly, and the reasoning behind them.

Hiring

Hiring is the foundation of all other management tasks. A self-driven, experienced team can motivate itself and deliver results quickly without a manager's guidance. On the other hand, even the strongest manager, facing a team with a weak foundation, can only grind away at improving the team's skills over time, without expecting short-term results (assuming they don't manage anyone out). Therefore, accurately identifying talent and assigning the right people to the right roles is crucial. This is even more critical for AI, which is less malleable than humans, making the choice of the right model for a specific task paramount.

There aren't many tricks here; it's mainly about accumulating firsthand experience and developing an intuition by using different AIs to build various products. My sense is that for writing product documents, making technical decisions, and assisting with brainstorming, Gemini 2.5 Pro is the best choice. For quick and straightforward projects, Cursor's Cheetah, rumored to be Grok Coding Fast 2, is ideal. But for complex, multi-step projects like the example above, GPT-5-Codex is still the way to go. Claude 4.5 Sonnet is a decent all-around model with no major weaknesses, making it very comfortable for daily use. However, for particularly complex projects like the one mentioned, I tend to avoid it. It has an annoying tendency to be lazy. For example, if you ask it to fix a unit test, GPT-5 Codex might wrestle with it and, if it fails, will come back and apologize. Claude, on the other hand, has a tendency to try for a bit, and if it can't fix it, it might secretly disable the test or just print echo “All tests fixed! 🎉” in the terminal and report back that the job is done. For some difficult tasks, it might decide on its own, without notifying the user, "Oh, this is too hard, let's do a simplified version," skipping the actual dirty work. This makes me hesitant to entrust it with overly complex tasks, as who knows what landmines it might be planting. You might think the task is done, only to review it and find it's a simplified version, rendering it useless.

So, if you want to achieve the ultimate level of slacking off as described above, choosing GPT-5-Codex is a primary prerequisite.

Task Delegation

Just like with humans, delegating tasks to an AI is surprisingly difficult. Let me give you an example to make it clearer: think about doing chores at home for your spouse, or asking them to do them for you. A common scenario is that a wife often has very specific expectations about what chores her husband should do and how. But because the husband is unaware of these expectations, the result is often a mess. The wife complains that he doesn't understand her, and the husband complains that she never told him. A similar situation is when a gardener comes to trim a tree. We might expect a light trim, but when the job is done, we find the large tree has been cut down to just a few thick branches. These are all typical scenarios that require precise task delegation. Our failure to communicate in sufficient detail leads us to believe our husband is slacking or the gardener is incompetent. But if we could set aside the "curse of knowledge" and our prior expectations, we might see that their actions were justified and not intentionally sloppy—they just didn't align with our specific expectations.

Whether it's doing chores, giving instructions to housekeepers or gardeners, or delegating tasks to subordinates at work, we must be especially careful to avoid this curse of knowledge. Some decisions have become habits for us, so we subconsciously overlook the fact that there are other ways to do things. This can be as small as how high to trim a tree, or as large as which step to take first in a project. If we don't provide clear instructions, we can't expect our husbands, gardeners, or subordinates to clairvoyantly follow our usual practices. Therefore, the first bad habit we need to overcome, for both human subordinates and AI, is expecting them to read our minds and automatically follow our routines.

A very useful technique here is to put yourself in their shoes: If I knew nothing and was a new employee, how might I interpret these instructions? Are there any parts that could be done differently? What descriptions do I need to add to confine this task within the framework of my expectations? Initially, you might need to do this exercise for every task, but you'll quickly get used to this way of thinking and develop a rapport with the AI. You'll learn which instructions are important and which can be left to its discretion. In this process, we often find that letting it have some freedom in small decisions is fine. It's like with chores: if the dishcloth isn't put back in its usual spot, it might be a bit annoying, but in the grand scheme of things, considering the chores got done, it's something we can live with.

Another particularly useful technique is voice input. Typing is tiring: when you type fast, you often make typos, and you have to constantly check for mistakes, then backspace to delete and retype. This significant mental burden naturally limits the length of our prompts. For example, being willing to type one or two hundred words to delegate a task to an AI already makes you a pretty good boss. But with voice input, perhaps because you can't easily edit, this problem disappears. We speak much faster and more naturally to a microphone than we type, allowing us to easily generate hundreds of words in two or three minutes. Sometimes, I even ramble on to the AI for five or six minutes, generating a prompt of one or two thousand words.

Saving time is a minor benefit. The key is the transformative change in the richness of information. Much of the information we think "I could say this, but it's too much trouble to type, so I'll skip it" comes out naturally in a voice chat. Although the final text may not be as polished as typed text, the AI is fully capable of understanding this seemingly disorganized information. So, I believe a secret to getting good results from AI is to talk to it with your voice, not type to it. And I mean literally using a microphone to "shoot your mouth off." By providing comprehensive instructions through voice, and articulating many detailed expectations during the chat, the AI can effectively complete the task I have in mind.

Therefore, in the age of AI, voice input is not a cost-saving measure for saving time, but an efficiency-enhancing tool that can fundamentally transform the AI's performance. It must be taken seriously.

Onboarding

Let's talk about onboarding. The reason I discuss task delegation before onboarding is that, unlike humans, AI can often accomplish a great deal without any formal training. And task delegation is indeed the most difficult part of using AI.

A key difference between AI and humans is that LLMs have no memory; all information must be provided through the context window. Each time it reasons, it faces a completely blank context window. If something isn't in the context window, it doesn't exist. So, for AI, onboarding isn't a one-time event like it is for a new human employee; it needs to be done every time you start a new conversation.

The most crucial thing to understand is that AI, like human employees, needs onboarding. How you do it is relatively less important. I generally provide training from three angles to give the necessary background information:

First, the prompt itself is the most direct way to provide context. For example, specifying that a particular document should be written in English or that reports should not contain quotation marks. You'll naturally find that some instructions need to be repeated frequently. When that happens, it's a good idea to document them. So, what goes into the prompt versus what goes into a document shouldn't be a rigid rule based on content, but rather determined by frequency, starting with the prompt. Documents are just a means to avoid retyping prompts. Artificially mandating that certain things must appear in a document is a bit of formalism that misses the point. Ultimately, the content of documents must also be read by the LLM through the context window. So, the basic method for onboarding AI is: use prompts for core information and use @documents to save typing time.

Second, don't just say what to do, but also why. This complements task delegation. After all, we can't possibly list every single possible decision for the AI. But AI is smart enough that if we explain why we are doing a task, it can infer how to make many detailed decisions on its own. In other words, it understands us better. I often copy relevant product documents from Confluence into my Cursor repo. This way, when I need to explain the project background, I can just ask it to read the document. With product and business context, its micro-decisions will be more thoughtful, and on a macro level, it will often provide insightful suggestions that are very helpful for moving the project forward and for our own thinking.

Third, I also document key technical decisions and architecture. Some people feel that AI is fine for small coding projects but struggles with large ones, having a "can't see the forest for the trees" problem where it fails to grasp the overall project architecture. This is due to the lack of relevant architectural documents and proper onboarding. The analogy with a new human employee makes this easy to understand. A new employee given a large repo and a prompt will also find it difficult to understand the broader framework and design patterns. But if they have a high-level overview that points them to the right files and the key architectural documents, it can greatly speed up their work and improve its quality. The same is true for AI. Therefore, as we said in this article, sharpening the axe doesn't delay the work of cutting wood. For codebases that are not yet AI-native, it's very beneficial to have the AI read through the code and write a summary document first. For new repos, it's also a good practice to have the AI maintain such a design document as you write code, which will allow it to go further in the future.

Although we call it onboarding training to emphasize that AI has no memory, this training can be accumulated. Letting the AI accumulate experience and build knowledge assets during development is something that can compound in value, so it's best to start early.

Process Guidance

In our specific slacking-off example, the AI knew to conduct deeper data analysis and iterate multiple times. During these iterations, it also knew to cross-validate, which led to finding a bug. This wasn't something it figured out on its own; it was something we told it to do in the prompt. However, our prompt didn't instruct it to do A, B, C, D, and E. Instead, we gave it a methodology. For example:

After you've done the basic model training, find the two best-performing models and conduct a multi-angle, in-depth analysis to understand their strengths and weaknesses. If you see anything worth a deeper look or anything suspicious, perform another round of iteration. During this process, pay special attention to whether the cross-validation data is correct.

This prompt is an example of what a Senior Manager does, as we mentioned in this article. We are not discussing with the AI which specific angles to analyze and then telling it the next steps based on its analysis. Instead, we provide a high-level, abstract workflow. The AI decides which direction to iterate in, but it knows it needs to perform a couple of iterations. This is a very practical and advanced technique in human management as well. A manager's role is not to micromanage but to enable, to teach according to aptitude, to teach them how to fish. For advanced models like GPT-5-Codex, this kind of procedural instruction is often very effective. Through a few rounds of iteration, it can genuinely and significantly improve the quality of the completed task.

Here's another example: I had a meme about Outlook, but I wanted to replace the old Outlook logo with the new one. Similarly, I told Manus, "Help me replace this, but after you're done, use your visual capabilities to look at the generated image, and if it needs adjustments, iterate a couple more times." It dutifully went through multiple rounds of iteration and adjustment. From my perspective, it felt like the task was completed perfectly in one go. Using AI this way is simply better than how others use it.

So, assuming the model is good enough, trusting the AI with complex tasks and providing methodological guidance can significantly improve its effectiveness. And if we find ourselves frequently giving similar instructions to the AI, it's a good idea to document them, similar to what we did in Wide Research.

Product Acceptance

Another core task of a manager is to accept the work completed by team members. This is equally important for both AI and humans, as the probability of human hallucination is not necessarily lower than that of AI. However, accepting work from an AI is simpler than from a human, mainly because the cost of execution is significantly reduced. Let me give two examples.

First, you've probably seen videos that use exquisite animations to visualize the internal details of an algorithm or model. Many algorithms that I find difficult to understand just by looking at formulas become instantly clear after seeing their visualizations, especially how their internal states change. Similarly, if we can create a visualization for the delivered work, it will greatly simplify our verification process. For example, business logic can be turned into an interactive state machine that allows us to test it with simple examples, or a machine learning model can display intermediate results in different colors. These visualizations can often help us spot problems quickly.

The main issue with visualization, however, is its high barrier to entry. If a human were to write the code for a similar webpage or animation, it could easily take several days. This is why, despite its benefits, it's rarely used in practice. But with the help of AI, creating these visualizations is very simple. We can include such visualizations in the target output right from the start when assigning the task. In other words, we treat verifiability/observability, like documentation, as a first-class deliverable from the beginning, thereby greatly reducing the difficulty of our acceptance process.

Another similar method is A/B testing, or having two teams compete. In a human company, having A and B teams for every project is completely impractical as it would increase labor costs by 100%. But for an AI, it's just a matter of opening a new window. Note that the A and B teams here don't necessarily have to be doing the exact same thing. For example, in my slacking-off instance above, I first had one AI train a model, and then I opened a completely separate AI, gave it the trained model and the test set, and had it independently write the testing and calculation code to verify that the metrics we had previously run were correct. This doesn't solve all problems, but it can also greatly simplify our project acceptance.

Therefore, a key element of using AI is to think about how to accept the project from the very beginning. The AI shouldn't just stop at completing the task itself; it should also think about how to make the project easy for us to accept, and treat this as an important goal.

Summary

Overall, to use AI well, we have a few seemingly scattered "secrets," such as using GPT-5-Codex, using voice input, teaching the AI methodology, and making it treat documentation and acceptance as primary goals. But these secrets all serve the core objective of managing AI. They correspond to the core management tasks a manager performs every day: hiring, task delegation, onboarding, process guidance, and product acceptance. The essence of management is leverage. These five things we do are all about documenting what can be reused and delegating what can be distributed. They allow us to use 5% of our energy to steer the entire project, leveraging the remaining 95% of the execution work. This acts like a lever, amplifying our intelligence and core skills, and ultimately empowering a virtual team with nearly infinite execution capabilities.

This slacking-off is just the most superficial benefit of this leverage. Its deeper meaning is that it gives you the opportunity to think about more important questions that you previously had no time for. When AI can perfectly execute your "how," you will have more time to think about and define the "what" and the "why."

Computing Life