Here’s what it’s going to take for Augmented Reality to take over the world

This post was originally published on this site

Form Factor

I’d wear those

One of the key assumptions that we in the AR community can generally agree upon, is that consumers will demand AR displays that look as close as possible to the existing form factor for eyeglasses. That means that the user is looking through a transparent display. These are called Optical See-Through Head Mounted Displays. The general and most commonly used term which will be used from here on out is, Head Mounted Display (HMD). HMDs also include the category of Virtual Reality displays, which are completely opaque. That does not mean that there won’t be significant adoption of AR before we reach that form factor, for example through smartphones, but AR HMD is arguably the idealized version.

I’ve heard on multiple occasions that even this wouldn’t be good enough. The reasons most commonly cited: You can easily lose or break glasses, they don’t work well while lying down (where many people use smartphones) and some people just don’t like wearing glasses.

Wishful thinking

While I generally agree that there are constraints with the eyeglass form factor, I don’t think that the contact lens will be where we see widespread adoption — partly because it’s probably impossible in the coming decades.

Hopefully it will be clear after reading the following section, that it’s hard enough to pack all the required elements into a pair of glasses. Adding the requirement of doing it on a contact lens, is not practically possible in the near future.

The Technical Requirements

Field of View (FOV): The idealized AR display will conform to the combined binocular and peripheral field of view of the human eye — which includes motion detection on the far edge of the FOV. That is approximately 200 degrees horizontal and 140 degrees vertical. Within this field, only about 140 degrees horizontal are binocular and the remaining 30–35 degrees are monocular peripheral vision.

Human Horizontal FOV

Pixel Per Degree (PPD): The idealized display would have equivalent pixel per degree resolution as the human eye at any depth of focus. That means a human can’t discern a difference between rendering elements (ie. pixels) at that resolution. While PPD and human eye focus is not a one-to-one comparison (we don’t have pixels — though you could argue the point with respect to rods and cones), we can consider 60 PPD as the human equivalent necessary for a convincing display. This is the first key element required to eliminate the problem of eye fatigue with digital display systems.

Latency: In a similar nature as PPD, the latency of the visual pathway should mimic as close as possible to the human biological. Experiments put humans at reliably discerning differences in visual stimuli at around 77 Frames Per Second (FPS) or 13 milliseconds between being shown a new image and recognizing it as such. To take a page from the VR world, if we want to be conservative we should expect that anything worse than 60 Frames Per Second would be unacceptable.

Accommodation: When you want to focus on an object close to you, muscles around your eye, deform your eyeball lens ever so slightly to focus the incoming light (in the form of multiple and varying wavefronts) onto your retina. This process happens in reverse when you want to focus on something far away, and does so without conscious thought. This process is called accommodation, and allows us to change our focus simply by refocusing our gaze. Ideally, a display would give us unlimited focal points for which we can focus our gaze. In practicality I expect that an AR system would only need to provide between 5 — 10 different focal depths to be deemed sufficient (too little data to know with much confidence). This is the second key element required to eliminate the problem of eye fatigue with digital display systems.

Depiction of wave-front distortion with distance

“Rendering Black”: One of the more contentious topics in AR is how, and even if it is possible, to successfully show black in a see through AR environment. Given that “black” is a relative lack of light coming into the eyeball from a specific vector, it’s not simply as easy as reducing the strength of the emission that you are sending to the eye in a see through transparent display. The visual cueing that humans get from shadows and light effects, are very powerful, and are a key aspect of creating presence of a virtual object in AR. [Note: There are theoretical ways to do this but none have been proven or practical. The most promising technique to render black pixels that I know of is to have an optical element (a lens) that selectively blocks all light waves from a specific direction, regardless of wavelength.]

Power: Ideally this is a device that you can put on in the morning and take off at night without needing a re-charge. Matching iPhone 7 specs would put that between 10 and 13 hours of consistent use.

There are ungodly number of ways in which the community is trying to address these issues given the current state of technology.

The primary display method we see in practice, utilizes fixed micro OLEDs or LCDs, which are reflected through a series of waveguides presenting the image into your line of sight.

Moverio BT-300 waveguide and LCD. Image Courtesy Epson

While nearly ubiquitous, this method has hard limits on FOV and PPD. Further, there is no known way to give accommodation utilizing this method.

I consider Virtual Retinal Displays (VRD) or Retinal Scan Displays the best hope for achieving consumer level AR display technology. Introduced long ago, but becoming increasingly feasible in practice, these displays can project multiple wavefronts with various curvatures directly onto the retina, allowing you to focus at different depths. Done correctly, this can potentially give you a display with nearly unlimited FOV, accommodation and extremely dense PPD on the order of human acuity limits.

VRD diagram. Courtesy: https://www.google.com/patents/EP0562742A1?cl=en

BOTTOM LINE FORM FACTOR: For AR to be the primary computing device the form factor needs to be at least a pair of standard eyeglasses, which give human equivalent Field of View, Pixels Per Degree resolution, and Accommodation at a high enough refresh rate and a day worth of battery life . I’ll hold judgement on whether “rendering black” is required for wide scale adoption.