Saturday, November 3, 2007

PSC129: Chapter 4 Perceiving and Recognizing Objects

I) Middle Vision
1. Finding Edges
-Goal of middle vision: to organize the elements of a visual scene into groups that we can recognize as objects.
-Occasional lack of edge - visual system fills it in.
-Illusory contours: edges that are perceived despite they don't exist; Kaniza figure; contradicts structuralists
-Structuralists: Wilhelm Wundt and Edward Titchener; sum of atoms of sensation gives perception; complex objects are made up of components
-Gestalt: the whole is greater than the sum of the parts; Max Wertheimer, Wolfgang Kohler, Kurt Koffka
-Gestalt grouping rules: guide the visual system in its interpretation of the raw retinal image
a. Good continuation: two elements will tend to group together if they seem to lie on the same contour; continuing in same direction; group edges that hve the same orientation
-Perceptual "committees": come together to voice opinion on what the stimulus is
-Occlusion: an image stop because there is another contour occluding it

2. Texture Segmentation and Grouping
b. Similarity: Gestalt principle; two features to group together if they are similar in color, size, orientation and form
c. Proximity: Gestalt principle; two features to group together if they are close
d. Parallelism: weaker principle; for figure-ground assignment that parallel contours are likely to belong the the same figure.
e. Symmetry: symmetrical regions are more likely to be seen as a figure.
f. Common region: two features group together if they appear to be part of the same larger region
g. Connectedness: two items will tend to group together if they are connected
h. Common fate: group together if they are doing same thing or moving together; form perceptual group
i. Synchrony: all items in a set change at the same time, they tend to be grouped together even if they change in different ways.

3. Perceptual Committees Revisited
-Middle vision: collection of specialists, each with a specific area of expertise about what the input might mean. Goal is to come to single answer.
-Feature demons, cognitive demons, decision demon.

4. Committee Rules: Honor Physics and Avoid Accidents
-Ambiguous figure: generates two or more plausible interpretations
-Necker cube: every image is ambiguous but perceptual committees comes up with final interpretation
-Accidental viewpoint: a viewing position that produces some regularity in the visual image that is not present in the world; eg. four surfaces with different shapes, arranged in different orientations, and at different distances seen as four square regions at a very precise location

5. Figure and Ground
-The ability to distinguish figures (object in the foreground) from ground (surfaces or objects lying behind the figures) is a process called figure-ground assignment
-Rubin Vase/Face figure: analogous to Necker cube; ambiguous
-Principles
a. Surroundedness: if one region is entirely surrounded by another, it is likely that the surrounded region is the figure
b. Size: the smaller region is likely the figure
c. Symmetry: a symmetrical region is more likely to be seen as figure.
d. Parallelism: regions with parallel contours are more likely to be seen as figure

6. Dealing with Occlusion
-In real world, objects are often partially hidden by other objects
-Edges can relate across gaps: relatability: the degree to which two line segments appear to be part of the same contour
-Two edges are relatable if they can be connected with a smooth curve but not if the connection requires an S curve
-Nonaccidental feature: a feature of an object that is not dependent on the exact viewing position of the observer

7. Parts and Wholes
-Global superiority effect: properties of the whole object take precedence over parts of the object
-One heuristic used by perceptual committees when dividing objects into parts is the pair of concavities created when one object is pushed into another

8. Interim Summary
-Five principles
a. Bring together that which should be brought together: Gestalt principles and occlusion
b. Split asunder that which should be split asunder: edge0finding processes that divide regions from each other; figure-ground mechanisms
c. Use what you know: two dimensional edge configurations are taken to indicate 3-d corners or occlusion borders, and objects are divided into parts on the basis of an implicit knowledge of the physics of image formation
d. Avoid accidents: avoid interpretations that require the assumptions of highly specific, accidental combinations of features or accidental viewpoints.
e. Seek consensus and avoid ambiguity: using the preceding four principles, the "committees" of middle vision must come up with one interpretation which serve as input for processes that will recognize objects

II) Object Recognition
1. Templates versus Structural Descriptions
-Naive template theory: visual system recognizes objects by matching the neural representation of the image with a stored representation of the same "shape" in the brain; have too many templates
-Structual description: objects that are the same share a basic structure; a specification of an object in terms of its parts and the relationship between the parts.
a. Key component: object parts are represented in the structural descriptions
b. Generalized cylinders; superquadratics (3D shapes); geons (geometric ions)
-Recognition by components model: Biederman's model of object recognition, which holds that objects are recognized by the identities and relationships of their component parts.
a. Geons and relationships can be recognized at any angle; they are viewpoint invariant

2. Problems with Structural Description Theories
-Object recognition is not totally viewpoint invariant
-Geons may not be the language of recognition
-Template-like representations called views: the more the object was rotated, the longer it took to name it
a. Suggests that people store a template-like representation of the object and recognize the objects in the surprise phase by mentally rotating the misoriented objects back to the upright views they have stored in memory.
-Objects have a viewpoint-dependent component but view-based models have their own problems

3. Multiple Recognition Committees
-Takes longer to recognize objects at the subordinate levels than at the entry level.
-When naming atypical member of a category, it is faster to name it at subordinate level
-The visual system is employing several different object recognition committees, each working with their own sets of tools, at the same time.

4. Faces: An Illustrative Special Case
-Prosopagnosia: an inability to recognize faces; can recognize its a face but can't tell who
-The two levels of face recognition are not completely different systems
-Double dissociation: truly separate brain modules; one function can be damaged without harm to the other

III) Objects int he Brain
-Cells in primary visual cortex (striate cortex) respond to edges or lines; they have small and precise receptive fields
0Some striate cortex cells are involved in middle-vision processes such as grouping and texture segmentation
0Other middle-vision tasks such as the completion of illusory contours are handled in extrastriate cortex which lie just outside the primary visual cortex.
a. From extrastriate cortex of the occipital lobe, visual inormation moves out along two main pathways.
b. One pathway heads up into the parietal lobe; process inofrmation relating to the location of objects in space and actions to interact with them; the where pathway
c. The second pathway leads to the temporal lobe; the what pathway; explicit acts of object recognition
-Agnosia: psychic blindness; failure to recognize objects typically due to brain damage
-Inferotemperal cortex (IT): part of cerebral cortex in the lower portion of temporal lobe, important in object recognition
a. Cells in IT cortex have receptive field that spread over half or more of the field of view.
b. Doesn't get excited by spots and lines but by objects
-Hierarchy of visual perception: small receptive fields and simple features of visual cortex were combined with ever-greater complexity as one moved from striate cortex to IT cortex, eventually culminating in a cell that fires; idea of grandmother cell

Summary
1. After early visual processes have extracted basic features from the visual input, it is the job of middle vision to organize these features into the regions, surfaces, and objects that can, in turn, serve as input to object recognition and scene understanding processes.
2. Perceptual "committees" serve as an important metaphor in this chapter. The idea is that many semi-independent processes are working on the input at the same time. Different processes may come to different conclusions about the presence of an edge or the relationship between two elements in the input. Under most circumstances, we see the single conclusion that the committee settles on.
3. Multiple processes seek to carve the input into regions and to define the edges of those regions, and many rules are involved in this parsing of the image. For example, image elements are likely to group together if they are similar in color or shape, if they are near each other, or if they are connected to each other. Many of these grouping principles were first articulated by members of the Gestalt school.
4. Other, related processes seek to determine if a region is part of a foreground figure, like this black O, or part of the background, like the white area around the O. These rules of grouping and figure-ground assignment are driven by an implicit understanding of the physics of the world. Thus, events that are very unlikely to happen by chance (eg two contours parallel to each other ) are taken to have meaning. (Those parallel contours are likely to be part of the same figure.)
5. The processes that divide up visual input into objects and background have to deal with many complexities. Among these are occlusion - the fact that parts of objects may be hidden behind other objects, and the fact that objects, themselves, have a structure. Is your nose an object or a part of a larger whole? What about glasses or hair or a wig?
6. Template models of object recognition hold that an object in the world is recognized when its image fits a particular representation in the brain in the way that a key fits a lock. It has always been hard to see how naive template models could work because we might need one "lock" for every object in every orientation in every position in the visual field.
7. Structural models propose that objects are recognized by the relationship of parts. Thus, an H could be defined as two parallel lines with a perpendicular line joining them between their centers. A cat would be more difficult, but similar in principle. Such models are viewpoint-dependent. The orientation of the H doesn't matter. Object recognition, however, is often viewpoint-dependent, suggesting that the correct model lies between the extremes of naive template matching and pure structural description.
8. Faces are an interesting special case in which viewpoint is very important. Upright faces are much easier to recognize than inverted faces are. Moreover, some regions of the brain seem to be specifically interested in faces. They lie near other regions in the temporal lobes that are important for recognition of other sorts of objects.

No comments: