Sasha Archibald, after Alfred Yarbus, after Ilya Repin, They Did Not Expect Him (aka An Unexpected Visitor, 1884).
DB here again:
One blog about eyes deserves another–actually a couple more. These entries, however, won’t be about actors’ or characters’ eyes. They’re about yours and mine.
We use them when we watch movies, but there’s been surprisingly little talk about how we do it. Even film theorists who talk about the Gaze or Visual Culture have not devoted much time to studying how we actually see movies. The whole business is pretty complicated, I grant. But if you’re willing to start by thinking about how we use our eyes in getting through the world, and then move to thinking about how we look at pictures, we can pretty quickly gain some understanding about how we watch films. That’s the business of this entry and the next one.
Bottom-up or top-down?
Unless you’re reading this in a cyclotron or on a roller coaster (always a possibility in these days of mobile media), your surroundings seem pretty stable, no? Look up from your screen and you’ll register the continuous space of a room, or a city vista, or a landscape. What’s remarkable is that this sense of a visual environment that’s all of a piece is composed of thousands of probes. Our eyes sample our surroundings, and the pieces that we snatch somehow melt into a solid, coherent world.
Surprisingly, our eyes have a very limited ability to focus precisely. The fovea, that compact scoop of cells that registers fine detail, is very tiny (about a millimenter in diameter) and has an angular coverage of less than two degrees. Yet it’s a key conduit of information. About 50% of visual nerve fibers are dedicated to the fovea, and acuity falls off very fast beyond it. Other areas of the eye can detect grosser changes in the environment, but in order to see anything clearly, we must constantly shift our eyes to bring the fovea to bear on it. When we follow a moving object, our eyes execute what are called smooth pursuit movements. In viewing a more or less static visual array, we execute saccades, very fast jumps from one fixation to another. Vision is a matter of saccades and fixations, scanning and sampling. A striking fact about saccades is that from one fixation to the next we are mostly blind to what’s happening in the visual field.
But what guides that scanning and sampling? We usually think it’s a matter of attention, and that’s probably not far off. It’s hard to pay visual attention to something that isn’t the target of foveal fixation. When we examine something in detail, we’re clearly devoting mental resources to it, whether it’s a streaky tulip or a misplaced comma. But what triggers our attention, and thus our foveal activity, in the first place?
Commonly we say that something catches our eye, cries out to be noticed, grabs our attention. That is, something out there becomes salient, so we send a saccade to it and fixate on it to get information. This is what psychologists call a bottom-up account. A stimulus triggers our visual system, which in turn recruits our mind to make sense of what has popped out.
An example: Looking straight ahead, you’re starting to cross a street. Something registered on the periphery of your vision seems to be suddenly bearing down on you. You turn your head and look: a car isn’t slowing down for the stoplight and you involuntarily jerk yourself back out of harm’s way. The errant car was salient, your visual system kicked in, and your body obeyed—all in a flash. You might not even be able to identify the car or driver as it runs the light, and you might say: “I didn’t even have time to think about it.” This is bottom-up, stimulus-driven seeing and acting.
Contrast another way to use your eyes. You’ve parked your car outside the Mall of America. Hours later you come out, a little uncertain about where the car is. Once you get to the general vicinity you recall, you use your knowledge of the vehicle to search it out. Let’s see, silvery Toyota sedan. Hell, too many Toyata sedans, all silvery. Wait, mine has a faded Obama-Biden sticker on the bumper and a rabbit with glowing eyes in the back window. Aha, there it is. This is a mode of looking guided by ideas and prior knowledge. Here perception is top-down, idea-driven; vision is informed by what you expect, recall, or believe about the world.
Top-down perception can focus our attention so drastically that we miss some glaringly obvious things. Consider Dan Simons and Chris Chabris’s famous basketball video experiment. If you’re not aware of this demonstration, proceed immediately to this page and take the test yourself.
Using a video of several players passing basketballs to one another, Simons and Chabris asked volunteers to silently count the passes made by players in white. But what Simons and Chabris were really testing was the extent to which people display “inattentional blindness.” About half the viewers were so preoccupied with the task assigned that they missed a rather salient item in the display: a gorilla that walks onto the court, thumps its chest, and walks off.
Simons and Chabris weren’t concerned with tracking eye movements (though later researchers have tried with the video; see the end of this piece). What the gorilla experiment indicates, however, is that top-down control has the drawback of narrowing our attentive focus so drastically that we miss the obvious. The curious phenomenon of inattentional blindness has become a robust area of research in cognitive psychology.
They did not expect…what?
We might think that visual search like the one demanded by the gorilla experiment is a special case. Isn’t most looking, including those saccades we execute all the time, bottom-up? After all, we are fairly passive, and we must take what we’re given by the world around us. Our attention is drawn to what pops out. There are a lot of features of the world that seem salient—bright colors, movement, strong contrasts, things coming toward us, and so on.
There’s another school of thought, though, and it’s articulated carefully in Michael F. Land and Benjamin W. Tatler’s Looking and Acting: Vision and Eye Movements in Natural Behavior (Oxford University Press, 2009). In ordinary life, they argue, we don’t just float though the world. We’re taken up with tasks. We walk, read, and make sandwiches. The tasks we undertake tacitly shape how and where we look and what we see. Land and Tatler want us to remember the top-down guidance of vision—the mind in the eye, so to speak.
Their central chapters trace how our acts of looking serve two basic functions: “finding and identifying the objects needed for the various tasks and guiding the actions that make use of these objects” (p. 59). In reading and drawing, or even walking or hitting a ball, the authors show, our eyes serve our brain’s sense of what must be done moment by moment. Wearing a nifty lightweight eyepiece or pair of spectacles, the experimental subject can act quite naturally and allow her point of gaze to be tracked and recorded to video. The results show that the tasks we launch, from crossing a street to reading a piece of music, create a series of phases that our eyes recognize and help us through, all without much conscious effort.
What about pictures? We aren’t interacting with them in the way we interact with teacups and steering wheels; we can’t affect their unfolding. Do our eyes behave as they do in our ordinary activities? In 1965 the Russian psychologist Alfred Yarbus reported the results of experiments that tracked eye movements. In some of them, he used Ilya Repin’s classic painting They Did Not Expect Him (aka An Unexpected Visitor, 1884). The dramatic image depicts a hollow-eyed man, gaunt and wrapped in a patchy coat, striding into a comfortable middle-class parlor.
First Yarbus simply let his subjects view the picture without any instructions from him. Their saccadic patterns were typified by this subject’s result.
Each line represents the fast movement of the eyes from one location to another (saccades) and clusters of lines are the traces of fixations. The denser the lines, the longer and more often a point was fixated. Sasha Archibald’s reconstruction at the top of this entry superimposes this pattern on the original picture.
Then Yarbus tried asking his subjects questions about the image. Here is the result of his asking one subject to estimate the material circumstances of the family.
A very different trajectory of attention emerges. Now the scanning was more purposeful, and it focused on the areas most likely to fulfill the task of identifying the family’s social class–clothes, the piano, the children, and other items. Moreover, when given more time to examine the picture, subjects did not roam around every cranny of the frame but returned constantly to the areas they had already examined, the ones that were most relevant to the task. Hence the blotchy areas, which are nodes where the eyes fixated very often.
Artists often claim that color, composition, and other features attract a viewer’s attention. But Yarbus concluded that while some sorts of visual material, chiefly faces and bodies, were targeted during the undirected scanning, many other features, such as color, edges, light or dark regions, and so on were not. “The character of the eye movements is either completely independent of or only very slightly dependent on the material of the picture and how it was made. . . . Depending on the task in which a person is engaged, i.e., depending on the character of the information which he must obtain, the distribution of the points of fixation on an object will vary correspondingly” (pp. 190, 192).
Your mission in watching a movie
Generally speaking, in blocking and framing a shot, the most important thing is to make sure the audience is looking where you want them to look.
Like painters, film directors talk of guiding our attention, isolating this actor, throwing one plane out of focus in order to emphasize another one. And we commonly say that the movie is designed to grab our eyes and guide them through each shot. As Zemeckis’ remark suggests, directors direct actors but they also direct us; they direct our attention, and they do it by making certain things salient in each shot.
Or so we think. If Yarbus and Land and Tatler are right, are we deeply wrong about how movies work? I don’t think so, but convincing you requires that I unpack some assumptions.
First, the world doesn’t come to us in a frame. A film shot, like a still photo or a painting, is bounded by edges, and as Rudolf Arnheim and Jean Mitry have pointed out, the very existence of the frame inevitably organizes what is put inside it. It makes little sense to say that something is in the center of your visual environment—that depends on where you’re looking—but everyone will agree what is in the center of a picture. And we are very likely to look at that central area of a frame or screen; Land and Tatler call it a “bias” (p. 39).
Repin took advantage of this bias by composing the primary action around a central region. It’s not the geometrical center of the image, which falls on a fairly innocuous patch of gray near the elbow of the woman seated at the piano. But there is a cluster of heads and shoulders just above that center. Fans of the “rule of thirds” will point out the glances of the man, the women in the background, the woman at the piano, and the rising woman lie along a line marking off the top third of the picture. The frame, by being a certain shape, creates lines of force within the image, and these can attract our scrutiny.
Second, human faces are a special case. We are sublimely sensitive to them. Faces are recognized even in low-resolution images, they are detected faster than other configurations, and we readily project them into ambiguous patterns. Hence we see the Man in the Moon and the Savior on a Cheeto. Naturally, artists realize the power of faces and gestures to attract our attention. Repin’s compositional design facilitates our pickup of the human drama he presents.
Filmmakers follow suit. Knowing that faces and movements are zones of high information, directors light, frame, compose, and edit their shots so that these zones get highlighted. Indeed, we might say that today’s “intensified continuity” style of filmmaking, emphasizing singles and facial close-ups, goes with the flow, giving us a full dose of what we’d look for anyway.
Yarbus stresses the all-over quality of undirected vision, at least when compared with more specific tasks. But I’d say that the scanpaths we find in his free version line up pretty well with Repin’s compositional pattern and the pictorial roles he gave to faces, bodies, and gestures. True, there is a lot of visual search in unrewarding areas. Nonetheless, that high, slightly sloping area above the geometrical center attracts heavy traffic, as does the daringly edge-centered children on the far right. It is simply the line of least resistance, at least when all other considerations are equal.
Yarbus made other things unequal. He asked questions, which created more guided paths. Still, regardless of what task they were assigned, Yarbus’ informants seem largely to have followed the compositional path Repin laid down. Asked to estimate the ages of the people in the picture, viewers gave a tighter, simplified version of the default, undirected path. To determine ages, face and height matter; the left window and furniture didn’t have to be explored much. Here is one subject’s pattern of scanning for signs of age.
Asked to memorize the costumes, the subjects also stuck to the program, with more searching of the body areas. Here is one example.
And asked to estimate how long the visitor had been away from the family, another viewer’s gaze traces a comparably tight slope, dwelling especially on the children at the far right.
In short, Yarbus’s questions about age, clothing, and years of separation were best answered by the faces and bodies on display–exactly the areas highlighted by Repin’s composition, color, and ensemble staging. Unsurprisingly, however, if your task is to estimate the family’s wealth, you’ll probably roam to the periphery of the action, as one subject did.
And if you’re told to memorize the spatial layout–a very unusual task that you’d seldom impose on yourself–you will spread your net quite widely, as one viewer did.
Yarbus’ results suggest to me that representational pictures elicit a set of default strategies: Start from the approximate center of the format. Watch for faces and gestures and an exchange of looks. Then launch further exploration of the picture space, anchoring that to the main compositional vectors and human signals. And of course take the title into account. The “they” of They Did Not Expect Him (virtually a literal translation of the original Russian title) prompts us to look for the reactions of onlookers.
With film, of course, we have additional pointers: sound, especially dialogue; camera movement, which is constantly redirecting our attention; and figure movement, which is a powerful eye-catcher. All things being equal, these channels of information will usually work in tandem with composition and the human signal patterns at work in a scene. Most films can be thought of as massively redundant systems for drawing our visual attention to certain items in the frame, second by second.
Story as task
One more point. Most of the factors I’ve mentioned involve bottom-up cueing. If in ordinary life our saccadic probes are governed by top-down task assignment, what about still images or moving pictures? Are there no task dictates at work? I think there are.
Recall most of the questions that Yarbus asked his viewers: the figures’ ages and clothing, their activities before the man entered, the family’s material circumstances. These are relevant to the tale the painting tells. It’s what we call a narrative painting, and most of Yarbus’ pointers are addressed to filling out the story.
The story may not be obvious to us today, but most commentators seem to agree that image represents a political exile returning from a labor camp to his family. The woman rising in the foreground is his mother, while his wife can be seen stirring from her place at the piano. His children are on the right, and many commentators interpret the somewhat fearful or puzzled expression on the little girl to indicate that she is too young to remember him. The image developed out of Repin’s sympathy for Russian radical movements of the time, and it was widely circulated by the later Bolshevik regime. Very likely Yarbus’s subjects would have seen the picture before and known the story behind it.
The point I want to make is that we do take on tasks when we watch a film image. Perhaps the most basic one is maintaining our interest, seeking out something that will keep our attention engaged at a basic level. But one major way to achieve interest is to make an effort to grasp how what’s happening onscreen develops the story.
Once the movie has started, we know who the main characters are and thus whom to watch most closely in ensuing scenes. We know something of their minds and motives, and we are sensitive to anything that impinges on those matters. So our top-down hypotheses about what’s going on and what will happen next shape what we look at and when we look at it.
Since story comprehension is one of our primary tasks in watching a mainstream movie, we will tend to ignore other things. We will miss changes in objects’ position across cuts (“cheats”) and disparities of lighting from shot to shot (e.g., the opening office scenes of The Godfather). These would seem to provide equivalents for the invisible-gorilla effect, although bottom-up factors are at play in such cases as well. Alternatively, when we don’t have any narrative expectations, as when we’re confronted with a lyrical avant-garde film by Stan Brakhage or Nathaniel Dorsky, perhaps we will let our eyes roam around the images more freely. Confronted by a film that denies us a narrative, we attend to composition, color, and other qualities that we may not notice in most storytelling cinema.
I’m convinced that research into vision is important to understanding film, but I’m a duffer at this. Next time, we hear from Tim Smith, a sort of modern-day Yarbus who monitors how we watch movies. An early example of his work is here, but next time we’ll catch up with his recent efforts.
Yarbus’s book is Eye Movements and Vision (New York: Plenum, 1967). It is rare and expensive, but a pdf is available online here. Google “Yarbus” and “Repin” together and you will find a great many research articles on eye movements and imagery. I’m grateful to amonseuldesir’s sitefor providing sharper diagrams of Yarbus’ result than I could squeeze out of my copy of his book.
Sasha Archibald has made a valiant effort to map Yarbus’ reported results onto the painting, as indicated in the image at the top of this entry. I thank Cabinet magazine for permission to reprint her schema. Thanks as well to Maria Belodubrovskaya for confirming the correct translation of the painting’s title. She recalls that in school she and other children were asked to discuss the reactions of the family portrayed in the painting.
For more on Dan Simons and Chris Chabris’ work, see The Invisible Gorilla and Other Ways Our Intuitions Deceive Us. Daniel Memmert has performed eye-tracking experiments with children watching the gorilla video. (See “The Effects of Eye Movements, Age, and Expertise on Inattentional Blindness,” Consciousness and Cognition 15 , pp. 620-627.) Surprisingly, Memmert found that many subjects who fixated on the gorilla during the video still didn’t claim to notice it! Simons and Chabris use this finding to suggest that even fixations don’t guarantee awareness. It seems that fixation is a necessary but not sufficient condition for noticing something; once more, the task at hand can block out even things that we can see clearly.
Related to inattentional blindness is “change blindness.” As I mentioned in an earlier blog, Dan Levin, who worked with Simons, has explored how our inability to detect changes in images or in the real world can affect our understanding of edited scenes in films. Tim Smith has further studied “edit-blindness” as a cinematic parallel to change blindness.
Robert Zemeckis’ remark about guiding the viewer’s eye is quoted in Jay Holben, “Sole Survivor,” American Cinematographer 82, 1 (January 2001), p. 40. For more on that idea, see the opening chapter of my Figures Traced in Light: On Cinematic Staging. More broadly, I discuss cinematic experience, and especially story comprehension, as an interaction of top-down and bottom-up factors in Narration in the Fiction Film and the first chapter of Poetics of Cinema.