Dr Patrick R Andrews,
Imaging Research Centre,
De Montfort University,
Leicester, UK.


The Search for Shape

Dr Patrick R Andrews

Imaging Research Centre, De Monfort University

SERC, Hawthorn Building, Leicester Campus

pra@dmu.ac.uk

Abstract

A new approach to automated pattern recognition, based on neurophyslology, is described. Some examples are provided of the performance of this algorithm, together with a brief summary of its potential application as a visually-intelligent, Internet Agent .

Introduction

Although access to information has greatly increased with the widespread availability of the Internet, search algorithms still cause effective bottlenecks; the most obvious one of which is that even if every item in a dataset carries a hypertext label, finding any particular image, or subset of images remains a non-trivial task.

This situation is a reflection of the underlying failure of Artificial Intelligence to provide machines with any real visual capability (1),(2). The objective of this work has been to investigate the sufficient functional principles of natural shape recognition -with a view to transferring this ability to machines.

A straightforward, non-adaptive, non-recursive algorithm has been developed, with reference to neural networks in the mammalian visual cortex, (3], which can store and retrieve information about an enormous range of images. It may thus soon be possible to design Agents with the ability rapidly to interogate large, unstructured pictorial databases with questions of the form. "find the three faces most like Marylin Monroe."

The corteices of cats, monkeys and humans have been shown to perform analysis of visual images by the use of orientated local filters tuned to different spatical frequencies (4). It has never been clear what, if any, significance these cells have for the recognition of patterns. A very large part of the visual cortex in these species is devoted to treatment of signals from the foveola; a tiny area near the centre of the retina (1/3 deg of visual arc). This region seems to have great significance for the recognition of patterns (5]

The visual system is able to deal with very large numbers of images which fall on even the small area of the foveola. This work involves an emergent definition of visual noise and highlights the fact that the combinatorial explosion is controlled by this differentiation between "textures" and "shapes". Here, the problem is restricted to the identification of monochrome images (There is evidence that the human visual system analyses grey images in central vision as if they were binary. Square and sine-wave contrast distributions look the same at spatial frequencies above 12 cycles/degree -ie in the Foveola.)

As we manipulate an object or move around it, we foveate many successive small angle views. In this model, each of these views produces a slightly different coding so that similar views "clump together" in the representation space (5]. The specificity of coding which we experience as visual acuity has been shown to derive directly from elementary number theory.

Novel images are thus automatically classified by virtue of their proximity in representation space to the codes of known images. Previously-seen objects can reactivate their existing representations and those of their neighbours: i.e. they can be recognised. These ideas may relate to the work on the inferotemporal cell ensembles known to be selectively responsive to complex visual stimuli in higher primates (6)]. There is a potentially useful similarity between the 3-D representation space used here and actual cortical cell ensembles.

The central idea is that it is possible to generate three numbers (X,Y,Z), for any given image, so that they are unique to the image in question; so that the codes for images considered "similar" by people are numerically close and so that image sub regions lacking well-defined edge structure are constructively confused.

The values of X,Y and Z are achieved by applying the same algorithm in each of the three principal axes of the hexagonal, foveal receptor mosaic. The algorithm itself is described in detail in 15] and consists of a two-stage process in which first, a small scale spatial differencing process identifies and quantifies the degree to which image edges are aligned with each of the three principal axes of the receptor array. the resulting distribution is then subject to a second, similar analysis but at a larger spatial scale. It is possible to choose a kernel for this process so that no two images with different outputs from stage one will have the same numerical output from stage two...i.e. that the system will never be confused between two images with different edge distributions.

Results

A small number of representative images and associated codes, are shown in Figure 1:

1 . Line drawing of a face, looking to 7 o'clock

2 . Line drawing of a face, looking to 1 o'clock

3 . Line drawing of a face, looking straight ahead

4 . Line drawing of a face, turning to right

5. Line drawing of a face, profile

6. The number 7

7. The capital letter A 8 A star

9. A hexagon

10. A line running from 5 to 11 o'clock

These sets of three numbers may be plotted, for each image, in an x,y,z framework, as shown in Figure 2.

Each dot stands for a particular image so that when, for example, a large .image is scanned, the sequence of X,Y,Z points generated by viewing each small region in turn forms a trajectory. This also means that images which are "similar" (in terms of the distribution of edges within them) will form points near to each other. It follows that if a previously-unseen image (A) forms a code which is numerically-close to that of some known image (B) then image A will be recognised as similar to image B.

The corresponding codes are plotted as points in Figure 2.

Conclusion

The system described here is very simple yet does seem to have some useful properties:


*it is potentially very fast: one trial learning is the rule -without the need to pre-filter training data.


*scene segmentation can be achieved by looking at a small enough image area at a time: local shadows and occlusion can therefore be accommodated.


*it is general/flexible in terms of the range of category types recognisable.


*despite its minimal design, it is not subject to confusions between different shapes.


*only simple "technology" is required. it is no more memory intensive than other competitive approaches.


*robustness: two ways to associate things are incorporated: visual similarity and xperience of visual sequences (such as different views of the face shown above) is stored as trajectories in coding space.

Acknowledgements

The author is the holder of a Senior Research Fellowship at De Monfort University. Grateful thanks are due to MacWarehouse Limited and to Psion pic for donation of equipment.

References

1. Minsky, M., The Society of Mind. 1988 London: Pan Books.

2. Aleksander, I. and P. Burnett, Thinking Machines: the search for artificial intelligence. 1987, Oxford: Oxford University Press. 208.

3. Hubel, D., Eye, Brain and Vision. 1988, New York: Scientific American Library. 240.

4. Campbell, F.W.G., R.W., Optical Quality of the Human Eye. J.Physiol., 1966. 1 86: p. 558-578.

5. Andrews, P., Complex Patterns, Simply Recognised, in Lectures in Complex Systems, SFI Studies in the Sciences of Complexity, L.N.D. Stein, Editor. 1991, Addison-Wesley. p. 353-369.

6. Perret, D., Smith,PAJ, Potter,DD, Mistlin,AJ, Head,AS, Milner,AD & Jeeves,MA, Neurones Responsive to Faces in the Temporal Cortex: Studies of Functional Organisation, Sensitivity and Relation to Perception. Human Neurobiology, 1984. 3: p. 197-211.




For more SYNDICATE SPEAKERS

For other SYNDICATES

Cyberbridge-4D