The Human Visual System

Krystel R Huxlin. Focal Encyclopedia of Photography: Digital Imaging, Theory and Applications, History and Science. Editor: Michael R Peres. 4th edition. Amsterdam: Elsevier, 2007.

The human visual system has often been compared to a photographic camera, an analogy that fails quickly upon serious examination. Indeed, visual perception, with its colored, emotionally charged and contextually modulated, three-dimensional view of the world differs markedly from the rapid flow of two-dimensional information captured by the eye, or for that matter, a photographic camera. This sensory system allows humans to form the cognitive, emotional, and creative insight that drives them to take photographs in the first place. It also provides a biological means to make sense of the results. How is this possible? What makes the human visual system so different from a machine? This question will be explored by first delving into visual system structure and organization, before examining some of the more important aspects of visual processing.

Basic Structure of the Human Visual System

Input to the visual system occurs through the eyes. The eyes are sensory organs, and their primary role is to detect light energy, code it into fundamental “bits” of visual information, and transmit it to the rest of the brain for analysis. In the higher levels of the brain, the information bits are eventually recombined to form mental images and our conscious visual perception of the world. The visual system is perhaps the most complex of all sensory systems in humans. This is evidenced by the multiplicity of brain areas devoted to vision, as well as the complexity of cellular organization and function in each of these areas.

The human eye is a globe cushioned into place within a bony orbit by extraocular muscles, glands, and fat. The extraocular muscles move the eyes in synchrony to allow optimal capture of interesting visual information. Eye movements can be either involuntary, reflex reactions (e.g., nystagmus, convergence during accommodation), or voluntary actions. Light rays enter the eye through a circular, clear, curved cornea, which converges them into the anterior chamber of the eye. The number of light rays allowed to pass through the rest of the eye is controlled by the iris—a contractile, pigmented structure that changes in size as a function of light intensity, distance to the object of interest, and even pain or emotional status. The iris controls not only the amount of light entering the posterior chamber of the eye, but also the focal length of this light beam, and thus the overall quality of the image achievable. After the iris, light passes through the lens, a small, onion-like structure whose different layers vary in refractive index which provides final focusing power to precisely position the visual information on the retina. The retina is the most photosensitive component of the human central nervous system and covers most of the inner, back surface of the eye. Its highly regular neuronal structure is normally organized into seven cellular and fiber layers, with the light having to pass through all these layers, plus several vascular plexuses, to reach the photoreceptors. Photoreceptors are neurons, which fall into two classes—rods and cones. Although both rods and cones respond to a range of wavelengths of light, the perception of color, as well as vision at high (photopic) light levels, depends primarily upon the cones. Rods distinguish between colors only on the basis of lightness, but their specialty is detecting tiny amounts of light (even a single photon at a time) under low (scotopic) light levels.

The retinal architecture is disrupted at two locations: the optic disc and the fovea. The optic disc, which creates each eye’s “blind spot,” is a region devoid of neurons, in which the axons of retinal ganglion cells exit the eye and enter the optic nerve. These axons are the only means by which visual information is transmitted from the eye to the rest of the brain. Their loss or dysfunction in diseases such as glaucoma or optic neuritis causes blindness. The fovea is a retinal specialization dedicated to achieving high-resolution vision along the visual axis of the eye. In this small region, the many neural and vascular layers of the retina are displaced sideways to allow light rays to reach photoreceptors directly. Unlike photoreceptors in imaging systems, the photosensitive elements of the eye are not distributed randomly. The fovea contains the highest concentration of photoreceptors in the retina (about 200,000 cones/mm2). Its view of the world is relatively unimpeded by distortions due to overlying neural tissue and blood supply. The concentration of cones decreases rapidly away from the fovea, to a density of 5000/mm2, at the outer edges of the retina. On the other hand, there are no rods in the fovea, but their concentration increases rapidly to a maximum about 20 degrees off the visual axis and then decreases gradually toward the outer edges of the retina.

While the capture and initial processing of visual information occurs in the eye, it is generally accepted that our conscious sense of visual perception occurs because of processing in the brain, and in particular, in the many cortical areas devoted to vision. On its way to the cerebral cortex, visual information that leaves the eye is first partitioned among different subcortical nuclei. These nuclei determine the ultimate use of the visual information. The great majority of visual information is sent to the lateral geniculate nucleus (LGN) of the thalamus. After some limited processing and sorting, LGN neurons transmit this information to the primary visual cortex (V1), located in the occipital lobes of the brain in most mammalian species. This information flow is the primary route that gives rise to conscious visual perception. The rest of the visual information originating from the eye is sent to three clusters of subcortical nuclei for processing. One cluster, the pulvinar/lateral posterior nucleus/superior colliculus group, is thought to play a role in visuomotor processing, visual attention, and other integrative functions in conjunction with the visual cortex. A second cluster (comprised of the intergeniculate leaflet, ventrolateral geniculate nucleus, and olivary pretectal nucleus) mediates responsiveness to light, especially the reflex regulation of pupil size. Finally, the suprachiasmatic nucleus and associated structures control circadian pacemaker functions for the entire body. Visual input to the suprachiasmatic nucleus provides signals for photic entrainment of the organism.

The proportion of the cerebral cortex devoted to visual processing in primates is significantly greater than that devoted to any other sensory or motor modality. Most of the visual information generated by the eyes and processed by intermediate, subcortical centers reaches the primary or occipital visual cortex first. It is then distributed to at least ten other visual cortical areas in the human brain (V2, V4, MT, inferior temporal contex). Each visual cortical area contains a separate map of visual space. In addition, neurons in different visual cortical areas exhibit different electrophysiological responses to visual stimulation, suggesting that individual cortical areas carry out different forms of visual processing. Consistent with this notion is the observation that damage to different visual cortical areas affects different aspects of vision.

Function of the Human Visual System

Ocular Optics

One striking similarity between the optical system of the eye and that in a camera is that they both form inverted images on the retina or image capture location. Furthermore, retinal diameter is about the same as the 24-mm dimension of the image area on 35mm film. However, the effective focal length of the typical eye is about 17mm, which results in a larger depth of field for the eye than in most common camera systems. This short focal length combined with a curved rather than flat retina, produces an angle of view of approximately 180 degrees in humans, compared to a normal angle of view of about 50 degrees for the camera. The iris muscle, which controls the diameter of the circular pupil, makes it possible to change the size of its opening from about 2 to 8 mm. These diameters correspond to f-numbers of f/8 and f/2, respectively. In addition to controlling the amount of light that enters the eye, pupil size alters image definition and depth of field. For people with normal vision, the best compromise for image definition between reducing optical aberrations with a small opening and minimizing diffraction with a large opening is obtained with a pupil diameter of about 4mm, or an f-number of f/4, which is midway between the maximum opening of f/2 and the minimum opening of f/8. Since depth of field is directly proportional to f-number, the depth of field is increased by a factor of four from f/2, the largest opening in dim light, to f/8, the smallest opening in bright light. Basic lens formulae can be used to calculate the scale of the reproduction image, object distances, and focal length for the eye. However, since the refractive index of the air in front of the eye and the liquid in the eye are different, the principal points and the nodal points will not coincide, as they do with air on both sides of a camera lens.

Another difference between the eye and photographic cameras is that while cameras have uniform resolution across the image captured, there is a non-uniform density of photoreceptors and other neuronal elements across the width of the retina. This means that visual perception, including resolution, differs between the visual axis and the retinal periphery. Indeed, while we can detect movement and gross shapes and distinguish between light and dark in the peripheral retina, the finest detail can be resolved only over a narrow area close to the fovea. In this area, visual acuity, measured as the ability to resolve detail in a test target, is greatest, reaching as much as 60 cycles (1 cycle = the critical distance between two barely distinguishable parts of the test target) per degree of visual angle.

In addition to the neuronally related loss of visual information, human vision is further degraded by non-neural elements. Even in healthy individuals, light that enters the optical system of the eye is deviated from its optimal, image-forming path because of corneal or lens defects, reflection, and scattering. Light that is reflected and scattered within the optical system will reach the retina as veiling light, with a non-linear reduction of image contrast. Physiological differences in the eyes of various individuals cause corresponding differences in optical quality and thus acuity. Some of the most common optical causes of decreased acuity in humans are problems at the level of the cornea, resulting in defocus (myopia and hyperopia) and astigmatism. In myopia, or nearsightedness, rays of light come to a focus in front of the retina. With hyperopia, or farsightedness, the rays of light focus behind the retina. Astigmatism causes off-axis object points to be imaged as mutually perpendicular lines rather than as points, thus reducing visual acuity. Several of these conditions are also aberrations found in camera lenses.

The intraocular lens is another major source of optical distortions. The relaxed lens of a person with normal vision has reasonably flat surfaces, which focus rays of light from distant objects onto the retina. Accommodation, which makes the lens more convex, enables the person to focus on objects at closer distances. The closest distance that can be focused on is called the near point. The near point for a young child may be as close as 3 inches, but the near point increases with age and may eventually reach 30 inches or more. Because the distance at which the relaxed eye focuses also tends to increase with age, it is not uncommon for nearsighted individuals to find their acuity improving with age for large distances but deteriorating for small distances. Normal aging can affect the optical system of the eye in many ways. Changes in tension of the ciliary muscles, attached to the zonule fibers at the edge of the lens, control the thickness of the lens and therefore the focal length. These changes in focal length are not large enough to produce an obvious change in image size, as would occur with a zoom camera lens, but they are sufficient to permit the eye to fine-focus on objects at different distances. As a person ages, the lens becomes less flexible. This makes focusing on near objects more difficult, something that eventually requires external correction, such as reading glasses. In addition to losing flexibility, the lens becomes yellowish with age, which affects the appearance of some colors, especially the complementary color blue. The lens can also become cloudy with age, a defect identified as a cataract, which increases the scattering of light in the eye. This is primarily corrected via surgical removal and possible replacement with an artificial, intraocular lens.

Advances in wavefront sensing technology, which was adapted to ocular imaging from astronomy, have allowed us to precisely measure the optical aberrations imposed by our imperfect ocular materials. As demonstrated by David Williams, Austin Roorda, and their colleagues, it is now possible to combine wavefront sensing with adaptive optics, and correct these optical aberrations to the point where single cells, including photoreceptors, can be imaged in the living, human eye. Ongoing research could one day allow us to design spectacles, contact lenses, intraocular lenses, and even refractive surgical procedures that correct not only basic distortions of the eye such as defocus and astigmatism, but also finer scaled, higher order monochromatic and chromatic aberrations.

Detection of Light

Light is detected through a chemical reaction that occurs in retinal photoreceptors at the very back of the eye. In humans, just as in all primates, there are two major types of photoreceptors—rods and cones. When photons enter photoreceptor cells, they are absorbed by specialized pigment molecules, altering their conformations and rendering them unstable, to the point where they break down into their components opsin and retinal (the latter being a derivative of vitamin A). The light-induced conformational change of pigment molecules starts an intracellular chain reaction that ends up generating an electrical signal in the photoreceptor. This signal is transmitted through a chemical synapse from the photoreceptor to a bipolar neuron, which in turn, transmits it, also through a chemical synapse, to a retinal ganglion cell. The ganglion cell sends its electrical signal away from the eye to the brain. Once it reaches the visual cortex, this signal is then consciously perceived. Within the retina, the transfer of electrical information initiated by the absorption of light quanta in photoreceptors is modulated by many lateral interactions (both excitatory and inhibitory, via horizontal cells and amacrine cells designed to amplify and better isolate the light signal from surrounding noise.

Under low light-level or scotopic conditions, rods are the main contributors to vision; under high light-level or photopic conditions, cones become the main contributors to vision, and the overlap region where rods and cones contribute fairly evenly to vision is termed mesopic. Rods are specialized for the detection of dim light because of a more photosensitive visual pigment (rhodopsin) and greater amplification of the light signal than in cones. The ability of the rod system to capture sparse photons is further enhanced by the fact that many rods converge onto single retinal bipolar cells. However, while rods are about 20 times more numerous than cones, this convergence of their output signals sacrifices their resolution. Nevertheless, rods do serve the useful function of detecting motion and lightness differences in the visual periphery, thereby signaling viewers to quickly shift their gaze in the appropriate direction for a more critical examination with their central (foveal) cones.

The human retina contains three types of cones, each containing a different cone opsin pigment that is sensitive to light in different parts of the visible spectrum. Broadly, defined, human cone opsins fall into three major categories—S cone opsin is most sensitive to short wavelengths of the visible spectrum (peak sensitivity at 420nm), M cone opsin is most sensitive to middle wavelengths (peak sensitivity at 530nm), and L cone opsin is most sensitive to long wavelengths (peak sensitivity at 560nm). The major role of the cone system in humans is high-resolution, color vision.

Perception of Color

Color greatly enriches our interpretation of the visual world, providing definition and complexity. The earliest investigators of the phenomenon of color vision did not think it possible that the visual system could contain a separate mechanism for each of the large number of colors we can distinguish (it is estimated that humans can discriminate two million different colors). One of the original color vision theories, the Young-Helmholtz theory, correctly suggested that there were only three different ocular sensors that responded to red, green, and blue light. While an interesting similarity between the eye and color photographic films is that they both analyze colors by responding to their red, green, and blue components, this similarity breaks down when these components are combined to create the colored percept. Indeed, even at the retinal level, information from the three-color channels (three cone types) is combined into an opponent organization. Without knowing its anatomical substrates, Ewald Hering, in the late nineteenth century, proposed an opponent theory of color vision, with three distinct opponent mechanisms: a red-green system excited by red and inhibited by green or vice versa; a blue-yellow system; and a light-dark or achromatic system. Anatomical and electrophysiological substrates for these three channels were only subsequently discovered. Fundamentally, signals from the three types of cones are combined through the retinal circuitry, so that by the time the information reaches ganglion cells, opponent inputs are clearly evident with one of the colors having an excitatory and the other an inhibitory effect on the rate of firing of the cells. The opponent theory can account for the results of additive mixtures of different colors of light, including combinations that appear neutral, as well as the visual effects produced by various types of color vision defects. It is also consistent with hue shifts of certain colors with variations of ambient illumination levels. Opponent information is transmitted out of the eye by different classes of retinal ganglion cells, which terminate primarily in the LGN. This nucleus largely preserves opponent information as it transmits it to the visual cortex. However, once in cortex, color information is further combined to develop a multiplicity of color channels (far exceeding three), each with its own selectivity and sensitivity to a particular, smaller domain of color and lightness. Furthermore, at the cortical level, color is no longer an attribute that can be separated from other object properties such as shape, texture, and movement. Each of these contextual properties can now be seen to influence the perceived color of objects both at the cellular and perceptual levels.

Human color vision defects are typically associated with the X-chromosome visual pigment genes (L and M cone opsin genes) and are extraordinarily common—affecting 8-10% of men with European ancestry. Due to the X-linked recessive nature of the trait, only 0.4% of females have a red/green color vision defect. Color vision defects caused by a loss of L photopigment function (usually through a loss of the L photopigment gene) are termed protan; those caused by a loss of M photopigment function (usually through a loss of the M pigment gene) are termed deutan. Deutan color vision defects are by far the most prevalent, affecting about 6% of men. Within the protan and deutan classes of color vision defects, there are two broad subcategories—dichromacy and anomalous trichromacy. Anomalous trichromats do not have the normal S, M, and L photopigments, rather they base their vision on an S pigment and either two different M pigments (protanomals) or two different L pigments (deuteranomals). They are trichromats, in that they have three different pigments, but are different from normal (anomalous). Dichromats base their vision on an S pigment and a single pigment in the L/M region of the spectrum. As might be expected, the dichromats (protanopes and deuteranopes), with only one photopigment, have very little color discrimination. They distinguish most colors on the basis of saturation and lightness variations alone. Anomalous trichromats have slightly better color discrimination, but in most cases nowhere near that of normal individuals. Monochromats are individuals who are truly “color-blind,” completely lacking functional cone photoreceptors. Though rare, there are also individuals with function of a single cone type (cone-monochromats) who can achieve minimal color discrimination under certain lighting conditions.

Perception of Form

Just like color vision, object or “form” vision depends critically on cortical processing, particularly within the ventral visual pathway that stretches from the primary visual cortex, through various cortical areas, to the inferotemporal cortex. Visual cortical neurons beyond the primary visual cortex are especially tuned to respond to contours of objects, both real and illusory. Cells higher up in the ventral visual hierarchy are important for the ability to discriminate patterns and shapes. At the highest levels of the ventral hierarchy, in the inferotemporal cortex, neurons have receptive fields so large that they often encompass most if not all of the visual field. This endows them with positional invariance; that is, they do not care about the orientation, size, or precise location of an object in space, but rather the identity of that object. Furthermore, these neurons respond most optimally when presented with really complex stimuli, such as particular objects, hands, or faces. Sometimes, what they respond to is not so much the whole face, for example, but a certain spatial relationship between the eyes and the nose/mouth, or a facial expression. This observation led to the early interpretation that the visual system was organized in a hierarchy, whose top levels consisted of “grandmother cells,” each responding to a particular object or small range of objects. Recent research has decreased the popularity of this interpretation. However, the inferotemporal cortex is still considered one of the highest levels of the visual cortical system, with its neuronal responses clearly modulated by inputs from other sensory and cognitive centers in the brain, which make this a truly integrative area.

Perception of Motion

Motion perception is one of the most important aspects of vision because it provides us with the information necessary to successfully interpret and navigate within our dynamic environment. Our sensitivity to motion is indeed highly developed with no less than three separate motion-sensitive components to the human visual system. First, there are the direction-selective retinal ganglion cells that are best excited by objects that move in a certain range of directions. These ganglion cells cannot resolve much detail about the moving objects and thus do not contribute to the direction selectivity found in the primary visual cortex, which preserves detailed information about the moving objects and arises primarily (via the LGN) from convergence of information from neighboring ganglion cells. However, visual motion processing does not stop with the primary visual cortex. It undergoes further refinement at higher levels of the visual cortical system to which the primary visual cortex sends a projection. One of the most important higher level areas involved in the processing of visual motion is the middle temporal (MT) visual cortex. Neurons in this area combine their inputs in a complex manner. This endows them with larger receptive fields than in the primary visual cortex, which allows them to integrate motion information over large areas of visual space and to extract motion signals from a noisy environment.

Perception of Depth

Unlike photographic cameras, the visual system combines input from two eyes to create most of the three-dimensional images we perceive. Binocular vision refers to sight with two eyes distinctly different from monocular vision, which refers to sight with one eye. Because at distances greater than 100 feet, the images formed by an object on each retina are almost identical, monocular cues can be primarily used to create far-field depth perception. This is achieved by use of monocular depth cues such as familiar size of objects, occlusion, linear and size perspective, motion parallax, and our own familiarity with the natural distribution of light and shadows. At distances shorter than 100 feet, stereoscopic cues come into play and contribute to depth perception. Stereopsis occurs because the two eyes are horizontally separated (by about 6cm in humans) and each eye thus has a slightly different view of the world. More specifically, close objects form slightly different images on the two retinas. This slightly different information from the two eyes is first combined onto single neurons in the primary visual cortex. It was in the late 1960s that Horace Barlow, Colin Blakemore, Peter Bishop, and Jack Pettigrew first discovered cortical neurons selective for horizontal disparity. While they play a major role in depth perception, these cells, just like most other visual cortical neurons, also contribute to a variety of visually related functions. For example, disparity information plays a critical role in ocular alignment when the eyes are trying to focus at a particular depth in the visual field—the eyes rotate nasally (convergence) when looking close and temporally (divergence) when looking far. This reflex develops very early in postnatal life and the ability of the brain to code visual disparity information is critical to this phenomenon.

The Changing Visual Percept

One major difference between the human (or any other living) visual system and a camera is that the biological perception of the visual world is never static. It constantly changes based on prior experience, stored visual memories, and planned actions. This plasticity of visual perception is currently also a hot topic of research, both at the cellular and behavioral levels. It is hoped that knowledge gained will one day provide a usable substrate for machine learning and the development of “smart,” computerized the vision algorithms. At the cellular level, it is becoming increasingly clear that visual learning as a result of experience and intensive practice, both during development and throughout adult life, occurs as a result of molecular and structural changes in visual neurons at every level of the visual system. New, more efficient connections are formed, both within and between different visual centers. These structural and molecular changes alter the electrophysiological properties and visual processing abilities of visual neurons and lead to improved visual performance that is largely restricted to the trained visual function. Practice makes perfect, even in the visual system.

Adaptation

Visual adaptation is a process of adjustment of the visual system to its environment—one particular form of visual plasticity has great implications for photographers and photography. The dynamics of the visual system are such that it attempts to de-emphasize stable stimuli of long duration to preserve sensitivity to potential changes. This principle applies to a wide variety of stimulus attributes including lightness, color, size, motion, orientation, pattern, and sharpness. Brightness/lightness adaptation enables a person to see the environment with enormous variations in light level, such as from sunlight to starlight, which represents an illuminance ratio of about a billion to one. The increase in sensitivity that occurs with decreased light levels is a gradual process, requiring about 40 minutes to reach maximum dark adaptation. Dilation of the iris can increase the amount of light admitted to the eye by only about 16 times. Most of the increase in sensitivity that occurs during dark adaptation is the result of changes in the pigments in the retinal receptors and in neural processing. In contrast to dark adaptation, light adaptation occurs within a few minutes. Photographers who need dark adaptation to see clearly in low light-level situations, such as for certain darkroom operations and night photography, can avoid its quick dispersion by using dark eyeglasses when exposure to higher light levels is unavoidable. A fairly intense red light can be used in darkrooms without affecting dark adaptation because of the insensitivity of rod photoreceptors (which contribute most to vision at these light levels) to red light. Because of the change in sensitivity of the visual system during light and dark adaptation, the eye is a poor measuring device for absolute light levels. This is typical of most other human sensory systems as well. In visual environments containing a variety of tones, the adaptation level tends to be adjusted to an intermediate value that is dependent upon the size, luminance, and distribution of the tonal areas. This local adaptation enables a person to see detail over a larger luminance range, but is not so great so as to interfere with the judgment of lighting ratios on objects such as portrait models, where experienced photographers are able to judge 1:2, 1:3, and 1:4 lighting ratios, for example, with considerable accuracy.

Chromatic adaptation occurs primarily because of bleaching of cone pigments in the retina. Upon exposure to short-wavelength or blue light, for example, the pigment in the S cones is bleached, rendering them less capable of absorbing photons of that wavelength. The net effect is that the bluesensitive cones become less sensitive to blue light, which causes neutral colors that are viewed immediately following exposure to the blue light to appear yellowish (explained by blue-yellow color opponency—see the above section Perception of Color).

Adaptation to orientation and other, more complex spatial and temporal characteristics of our environment have also been reported and are mediated to a large extent by cortical mechanisms. For example, if a person wears prism eyeglasses that make everything appear upside down for several days, they will eventually perceive the world as normal again. If the glasses are then removed, the world again appears to be inverted until several days pass and it resumes a “correct,” perceived orientation.

Visual Consciousness

It seems appropriate to conclude this exploration of human vision with some thoughts on perhaps the least understood of visual processes—visual consciousness. It is certainly paradoxical that it should be least understood since it is the most fundamental property of the human brain that (still) separates humans from machines. According to Christof Koch and Francis Crick, the primary function of visual consciousness is first, to produce the best interpretation of the visual world around us, taking into account previous experiences and maybe, our goals. A second role for consciousness is to make this information available to those brain regions involved in the planning and execution of motor outputs. Much current research is devoted to finding the neural substrates of conscious visual perception, largely by contrasting them with neural substrates of unconscious vision. To date, medicine has not discovered a single brain area where this function resides or mechanism by which it arises. Instead, many, perhaps all, parts of the visual system appear to contribute to its existence. Like the prototypical “smart” robot in science fiction movies, with brightly colored electrical signals zooming across it central processing unit, it is possible that visual consciousness arises from the global, simultaneous activity of the entire visual system. Devising experiments able to capture this phenomenon mechanistically remains one of the greatest challenges faced by visual neuroscience this century.