Exploring the Frontier: My Deep Dive into Apple Vision Pro

I've been playing around with the Apple Vision Pro (AVP) for a couple of weeks now. In this time, I've put it through its paces, developing a few apps and chatting with Apple Store support a few times. I've stumbled upon some unique insights that you might not find in mainstream media, and I'm eager to share them with you. A fundamental question about the AVP that really stands out is what kind of device it actually is, what scenarios it's suited for, and what scenarios it's not. In a world where we already have smartphones, tablets, and computers, I'm curious about the niche AVP fills. This article aims to explore these questions.

Scenarios Where Apple Vision Pro Doesn't Shine

Before diving into the best use cases for AVP, I want to discuss a few scenarios that seem perfect for the Apple Vision Pro but, in reality, don't live up to the hype. This includes AR gaming, office work, VR movie watching, and PowerPoint presentations.

Augmented Reality

Many media outlets and reviews rave about the cool AR applications for the AVP. For example, turning vacuuming into a game where you collect coins, or swapping out your cyber boyfriend's face in real-time. However, these aren't actually feasible. The reason lies in Apple's decision to prohibit app developers from accessing the camera feed at the system level. This isn't just Apple being Apple; it's a common practice across the virtual reality (VR) industry, including VR glasses from HTC, Facebook, and ByteDance, which also restrict access to the user's camera feed.

Without access to the camera video stream, apps can't detect the current location of your vacuum cleaner, making it impossible to play a coin-collecting game. Similarly, apps can't pinpoint your boyfriend's face position, orientation, and lighting conditions, making it difficult to accurately superimpose a different face. Essentially, apps can't support these types of applications due to this limitation.

This restriction is fascinating and crucial because it means that in the world of applications for the Apple Vision Pro, including Quest 3, we're actually unable to perform some precise interactions with our environment. For example, developing an app that helps users find things based on past recordings or pulling up a person's LinkedIn profile as you pass by them, as advertised by HoloLens, is impossible.

However, Apple has left a loophole in ARKit. ARKit allows apps to maintain a certain level of environmental awareness, such as sensing a wall two meters away from you or a floor one and a half meters below. It provides a 3D model and some not-so-accurate semantic information, which can be used to develop some applications. For instance, we could launch a ball that appears to genuinely bounce off the walls in your room. But more advanced applications, such as those involving machine vision and generative models, are still heavily restricted.

The point I'm trying to make is that AVP is not an AR device but a very traditional VR device. From all the reviews and official demo videos, the see-through background or so-called pass-through doesn't interact with the UI in any example. They resemble a VR display, showcasing various 2D or 3D virtual screens around us, which can even penetrate walls. This raises a significant positioning issue: despite Apple branding the AVP as an XR/MR device, it is, at its core, purely a VR device. At least in this regard, it's the same type of product as the Quest 3—a VR device with pass-through capabilities.

Office Work

From the perspective of office work, the Apple Vision Pro's 360-degree panoramic view and 4K high resolution seem ideally suited for professional use. However, after hands-on experience, I found its operational experience to be less satisfying than sitting directly in front of a computer. There are three main reasons:

First, despite facing an exceptionally large screen, possibly due to the limitations in eye-tracking technology precision, Apple imposes strict restrictions on the size of text within windows. For instance, aside from Safari, which allows text resizing, the text in the vast majority of native apps remains large regardless of how you zoom or move the window. Reducing the window size only shows less content without changing the text size. This severely limits the density of information we can process. According to Apple's documentation, this is mainly due to the limited resolution of eye tracking. When text or icons are too small (less than 60x60 points), a good user experience cannot be guaranteed. Thus, when using native apps, the 360-degree panorama does not display more information.

Second, unlike desktop operating systems, window management in the Apple Vision Pro primarily relies on spatial arrangement. It lacks the concept of minimization or desktop management tools, limiting our ability to open multiple windows simultaneously. For example, on a Mac, despite a smaller screen, we can open many windows and easily switch between them, needing only to click the corresponding icon in the Dock. Windows has a similar taskbar. However, the Apple Vision Pro lacks such a concept; you can only tile or stack windows on the screen. After stacking several windows, accessing the ones in the back becomes difficult without closing the front ones. This design, coupled with an already insufficient amount of screen information, means that when using the Apple Vision Pro for office or productivity applications, we can only open a few windows, significantly limiting efficiency.

Third, let's discuss Mac screen mirroring. This initially seemed like a great idea because screen mirroring on the Vision Pro could overcome the native app font size limitation, allowing for the display of smaller fonts. This might be because control is through the touchpad rather than eye tracking. However, actual experience revealed a serious limitation: resolution. According to various reviews, the screen mirroring resolution is around 4K, which actually falls short of the resolution of current high-end computer screens. For instance, I use two 6K screens at home, but screen mirroring only provides a 4K screen experience (some third-party apps can achieve multi-screen mirroring). This results in a decreased operational experience. Moreover, when the screen is too large, moving your head to see content on the far left or right is necessary. This is due to the relatively small FOV (Field of View) of the Vision Pro. In such cases, frequently moving your head can be very tiring. I previously reduced from three to two physical screens for this reason.

Overall, the office experience with the Apple Vision Pro does not compare to that of a proper multi-monitor setup. When I can sit in front of a computer and monitor, I prefer not to work wearing the AVP.

VR Viewing, Presentations, and Long-term Wear

From a viewing perspective, I had high expectations for this product, especially after being impressed by the display effects of the Pico 8K VR. Plus, the need to connect the Pico device to a PC, whereas the AVP allows for free movement without sitting at a computer, was particularly appealing to me. However, after deeper investigation, I found that all apps in the App Store claiming to support open VR formats like VR180 and VR360 have generally low ratings, with many users complaining about black screens during video playback. Further research revealed that the VR video formats that can be played normally on the AVP seem to be limited to a special video format from Apple. This format, which features high frame rates, dynamic range, and color gamut, is very advanced. Unfortunately, there seems to be no documentation or libraries for this format online, meaning that aside from shooting 1080P VR videos with an iPhone and Apple Vision Pro, we can't use any other documentation or libraries to create or watch videos. For instance, using my own Red camera with a Canon VR lens, or watching open VR content online, is greatly restricted. This essentially makes recording life's detailed moments or watching VR movies other than on Apple TV impossible. Perhaps Apple will gradually open up this format, but for now, viewing appears to be a significant limitation in terms of application.

Another interesting application area is business productivity, as demonstrated by Apple, Quest, and HoloLens. VR glasses introduce a variety of new interaction methods. Interactions have become three-dimensional, as is the information displayed. This opens up new possibilities for many productivity applications, such as the visualization of CT scans, exploration of high-dimensional data, and even PPT presentations. If previously displayed PPTs on a computer, even with 3D effects, were still version 2.0, now, with everyone using AVP, both animation effects and 3D materials would appear incredibly cool. However, after trying some visualization work, I unexpectedly found a significant barrier related to eye tracking. When an AVP device is bound to one person, lending it to another requires them to recalibrate eye tracking, which takes about two to three minutes of setup time before use, and it cannot be used with regular glasses; instead, special magnetic Zeiss lenses must be used. These lenses are rather expensive, and since I wear glasses, this means I need to purchase an additional set of lenses for each AVP device I own. Furthermore, if I'm hosting a meeting and want everyone to wear an AVP for an immersive presentation, the cost and complexity of setup become prohibitive.

Lastly, in terms of long-term wear, Apple claims that the AVP's balanced weight distribution across the head, along with the forehead and rear head support pads, should minimize discomfort. However, in practice, even with these design considerations, wearing the device for long periods can lead to fatigue and discomfort, especially for those not used to wearing such headgear.

Scenarios Where Apple Vision Pro Excels

Up next, we'll delve into the most fitting scenarios for deploying the Apple Vision Pro (AVP). As previously mentioned, the AVP isn't set out to replace the conventional desktop working experience. Yet, in a multitude of use cases, the AVP exhibits significant functional overlap with the iPad and in these scenarios, not only does the AVP prove competent, but at times, its performance even surpasses that of the iPad. For instance, when it comes to watching videos, reading books and novels, browsing the web, document processing with a keyboard, engaging in online chats, and sending emails, the AVP delivers a seamless experience. Thanks to its equipped panoramic view, high resolution, high dynamic range, and high refresh rate screen, the AVP often provides a superior user experience compared to the iPad. After spending some time with the device, I've found that the AVP offers an exceptionally outstanding experience in specific scenarios.

Firstly, in certain scenarios where holding an iPad might prove cumbersome, the AVP, which requires no handheld operation nor any controllers, becomes particularly suited. A prime example is usage while lying down. Many might relate to the mishap of an iPad falling onto one’s face when trying to watch it lying down; however, this issue is non-existent with the AVP. Moreover, when paired with a Bluetooth keyboard, the AVP experience feels very natural, and since it's worn on the face, issues of weight and comfort cease to be a concern. Therefore, the AVP facilitates work or entertainment in unconventional scenarios like lying down, breaking free from the constraints of mobile screen sizes and experiences, even making one consider upgrading their home sofa or investing in a massage chair.

Another scenario is using the AVP while exercising. Holding a phone or tablet can be quite uncomfortable during physical activities, but the AVP completely eliminates this issue. I now frequently use it to read novels or watch movies while exercising, which, to some extent, has increased my enthusiasm for physical activity. For new parents, another scenario involves caring for a newborn, where feeding, burping, and lulling to sleep require both hands, making it impossible to hold a phone or tablet. But with the AVP, through simple finger gestures or voice recognition, one can read or binge-watch while taking care of the baby, opening up a whole new application area for the iPad.

The second scenario where the AVP excels is when a wide field of view is needed, such as when many windows need to be tiled with frequent switching in between. This is very suited for the AVP. For instance, while reviewing papers, I generally prefer the iPad for its excellent touchscreen reading experience. However, the iPad’s screen is quite small, making it strenuous to view even a full-screen PDF, let alone frequently switching between other apps. The AVP offers us a complete view: Telegram on the left with an AI Bot for summarizing and Q&A on arXiv papers, Safari on the right for research, a giant PDF screen in front, and a note-taking app on the side for review notes. The entire workflow is incredibly smooth. Compared to the heavy app switching on the iPad, it offers a more natural and efficient workflow. Of course, compared to the AVP, I still feel sitting in front of a computer might be more efficient, but in such cases, the touch screen-like control experience of the iPad and AVP, allowing us not to sit upright at a computer but to lie down and read papers, lowers the overall barrier.

Moreover, a similar application scenario is online grocery shopping, where different vendors offer various items with different standards for bundling orders, necessitating frequent app switching on the iPhone or iPad. However, the AVP offers another approach, allowing us to tile multiple windows and switch between them conveniently. For specifics, refer to this video.

The third scenario perfectly suited for the AVP is watching videos. Despite mentioning that the AVP's support for open VR standards isn't ideal, even for watching regular videos on platforms like Bilibili, the AVP provides a stunning performance. This is largely due to its high-spec display, featuring panoramic, high dynamic range, high refresh rate, wide color gamut, and high resolution. Especially after subscribing to Bilibili Premium, watching Dolby World's 4K60P videos on a large screen offers a truly astonishing experience. I was watching this video at the time, and it's worth a try if you're interested.

Reflecting on and Identifying Shortcomings

I've since devoted considerable effort into further learning about AVP app development. Delving deeper into the design principles of the Apple Vision Pro, I stumbled upon an insight that felt counterintuitive at first: running iPad/iPhone apps on the Apple Vision Pro isn't as straightforward as it seems. Take apps designed for the Apple Vision Pro, for example. Their natural usability stems from their ability to respond to the user's gaze, providing clear feedback on where the user is looking. In native AVP apps, elements such as buttons subtly rotate and their reflections change as you move your gaze across them, mimicking a three-dimensional interaction. This design philosophy, known as responsive design in Apple's lexicon, aims to convey the system's interpretation of the user's gaze through subtle UI cues, functioning somewhat like a cursor but in a more intuitive manner.

However, iPad/iPhone apps inherently lack this feature. Typically, user interaction is through direct touch, devoid of gaze tracking or three-dimensional flipping. Users might find this lack of visual feedback disorienting, uncertain if their taps are recognized or if they've hit the intended targets. Yet, in reality, iOS apps also incorporate a form of responsive design through detailed highlight changes, especially with system-native controls, providing at least some form of highlight to indicate selection, such as in WeChat's controls. Some apps, however, opt for completely custom controls, which can significantly detract from usability due to the absence of visual feedback.

How does Apple achieve this seamless integration? Further investigation revealed the introduction of a spatial feedback mechanism called "hover" in iOS programming back in 2022. When using an Apple Pencil or a trackpad, hovering over screen elements triggers system controls to react, indicating the detected position of the pencil tip or cursor. In the Apple Vision Pro, this mechanism adapts to respond to eye movement, essentially allowing users to control a virtual Apple Pencil with their gaze. This ingenious design principle ensures that even standard iOS apps adhere to the responsive UI design principles within the Apple Vision Pro ecosystem. Apple's strategic foresight in implementing the hover mechanism, seemingly minor in the iOS app design context, has facilitated a natural transition of iOS apps to the Apple Vision Pro platform.

Shifting gears from the admiration of Apple's ecosystem, it's essential to acknowledge the hardware limitations and shortcomings of the AVP. As mentioned earlier, its ergonomic design and relatively narrow field of view (FOV) haven't posed significant issues for me. However, two main challenges have impacted my daily use. First, the microphone placement is problematic. Its distance from the mouth is notably further than when speaking directly into an iPhone's microphone, leading to a noticeable decrease in voice recognition accuracy. Considering voice input's efficiency over keyboard or gesture inputs—especially since AVP currently lacks Chinese input support—voice commands play a crucial role in the ecosystem. The reduced accuracy due to the microphone's placement detracts from the user experience. Potential solutions include using Bluetooth headsets, like the AirPod Pro, or developing AI models to enhance recognition accuracy in noisy environments.

The second, and perhaps more significant, drawback is screen glare. Initially, the AVP's display quality was impressive, but over time, the screen seemed to fog up, as if covered by a veil. Discussions with Apple staff suggested two potential causes: the use of Pancake Lenses, leading to glare in high-contrast scenarios, and possible contamination from skin oils or sweat. Cleaning the lenses with approved solutions, such as Zeiss Lens Spray, noticeably improved the situation. Yet, unavoidable glare from the Pancake Lenses remains, especially against dark backgrounds with bright foregrounds. Adopting dark mode in apps and adjusting background brightness can mitigate this issue, as seen in the latest Vision OS updates, which now support dark mode in apps like Safari.

Conclusion

The AVP has not only introduced convenience into my life and work but has also sparked a wealth of inspiration. It offers new tools and perspectives for optimizing workflows and presenting content, particularly intriguing for app developers. The prospect of integrating a third dimension of interaction, based on eye tracking and gestures, opens up innovative opportunities for personal workflow efficiency. This excitement and curiosity, coupled with the practical benefits, are why I continue to use the AVP, even as many opt to return or resell theirs.

Comments