Categorization
In Cognitive Computer Science


John F. Sowa

VivoMind LLC




UQÀM Summer Institute on Cognitive Science

30 June 2003











 

Categorization

Classification and categorization are fundamental to intelligence — in every species.

  • Similarity:  Recognition that two stimuli are signs of the same category.

  • Identity:  Recognition that two stimuli are signs of the same categories for all relevant purposes.

  • Generalization/specialization:  Recognition that some category includes another.

  • Negation:  Denial that some stimulus is a sign of some category.
Note:  these four operations, when combined in all possible ways, are sufficient to define first-order logic.








 

Categorization and Reasoning

  • Deduction:  Applying a general principle to a special case.

  • Induction:  Deriving a general principle from special cases.

  • Abduction:  Guessing that some general principle can relate a given pattern of cases.

  • Analogy:  Finding a common pattern in different cases.









 

Peirce's Logic of Pragmatism










 

Sensory-Reasoning-Motor Cycle

  1. Induction:  From observations to generalizations to the "knowledge soup."

  2. Abduction:  Extract hypotheses from the soup to form a tentative theory.

  3. Revision:  More abductions to revise the theory.

  4. Deduction:  Use the theory to make predictions.

  5. Action:  Test predictions by changing the world.

  6. Repeat from line #1.









 

Replacing Sherlock Holmes










 

A Big Categorization Project

Cyc project started in 1984 by Doug Lenat.

  • Name comes from the stressed syllable of encyclopedia.

  • Goal:  implement the commonsense knowledge of an average human being.

  • After $65 million and 650 person-years of work,
    600,000 categories
    defined by 2,000,000 axioms
    organized in 6,000 microtheories.

  • But it cannot compete with a 10-year-old child.









 

Cyc Review

Two-day DARPA-sponsored review of Cyc in June 2003 with about two dozen AI experts.

Consensus:

  • Cyc is a unique and valuable resource:
    A great deal has been learned from it.
    Much more can be learned from it.
    If it were canceled, something like it would have to be done again.

  • Support for Cyc should be continued.

  • Cyc should be freely available for research purposes.

  • But there are many questions about the relationship of Cyc to other R & D efforts.









 

Lexical Resources

Developers of WordNet (George Miller) and FrameNet (Chuck Fillmore) were also present.

Consensus:

  • Lexical resources are complementary to Cyc.

  • Extremely valuable for natural language projects.

  • Desirable to integrate contributions from various sources.

  • Integration would require relatively modest funding.

  • Word senses (synsets) can be linked to the categories of Cyc and other axiomatized ontologies.









 

Feigenbaum's Question

Ed Feigenbaum asked why Cyc has taken so long to become "intelligent".

  • In 1961, I. J. Good made a prediction:
    It is more probable than not that, within the twentieth century, an ultraintelligent machine will be built and that it will be the last invention that man need make.

  • Why hasn't Good's prediction come to pass?

  • Is there some missing ingredient that the AI community hasn't discovered?

  • What is it?  Could it be added to Cyc?









 

Cyc's Piece of the Pie

  • Cyc does not replace Sherlock Holmes.

  • It requires people like him to write axioms.

  • At a cost of $10,000 to encode one page from a textbook.









 

Ibn Taymiyya Contra Aristotle

  • Fourteenth century Moslem legal scholar.

  • Admitted that deduction is necessary for pure mathematics.

  • But for reasoning about the world, deduction is limited to the accuracy of the induction.

  • Given the same data, analogy can replace induction + deduction.









 

Ibn Taymiyya's Argument

  • A theory can be useful, if available.

  • But analogy can be used when no theory exists.









 

VivoMind Analogy Engine

Three methods of analogy:

  1. Matching labels: 

    • Compare type labels on conceptual graphs.

  2. Matching subgraphs: 

    • Compare subgraphs independent of labels.

  3. Matching transformations: 

    • Transform subgraphs.

Methods #1 and #2 take (N log N) time.

Method #3 takes polynomial time (analogies of analogies).




 

Analogy of Cat to Car

Cat Car
head hood
eye headlight
cornea glass plate
mouth fuel cap
stomach fuel tank
bowel combustion chamber
anus exhaust pipe
skeleton chassis
heart engine
paw wheel
fur paint

VAE used methods #1 and #2.

Source data from WordNet mapped to CGs.






 

Matching Labels

Corresponding concepts have similar functions:

  • Fur and paint are outer coverings.

  • Heart and engine are internal parts with a regular beat.

  • Skeleton and chassis are structures for attaching parts.

  • Paw and wheel support the body, and there are four of each.









 

Matching Subgraphs

A pair of isomorphic subgraphs:

  • Cat:  head → eyes → cornea.

  • Car:  hood → headlights → glass plate.

Approximate match (missing esophagus and muffler):

  • Cat:  mouth → stomach → bowel → anus.

  • Car:  fuel cap → fuel tank → combustion chamber → exhaust pipe.







 

Relating Different Representations

Method #3 for relating data structures that represent equivalent information.

  • A structure described in different ways:

  • English description:  "A red pyramid A, a green pyramid B, and a yellow pyramid C support a blue block D, which supports an orange pyramid E."

  • A relational database would use tables.

  • But many different options for chosing tables, rows and columns, and labels for the columns.





 

Representation in a Relational DB












 

CG Derived from Relational DB
















 

CG Derived from English

"A red pyramid A, a green pyramid B, and a yellow pyramid C support a blue block D, which supports an orange pyramid E."














 

The Two CGs Look Very Different

  • CG from RDB has 15 concept nodes and 8 relation nodes.

  • CG from English has 12 concept nodes and 11 relation nodes.

  • No label on any node in the first graph is identical to any label on any node in the second graph.

  • But there are some structural similarities.

  • VAE uses method #3 to find them.













 

Transformations Found by VAE


Top transformation applied to 5 subgraphs.

Bottom one applied to 4 subgraphs.

One application could be due to chance, but 4 or 5 contribute strong evidence for the mapping.








 

Evolutionary Pragmatism

Worm:  sensory-motor cycle.

Fish:  sensory-analogy-motor cycle.

Mammal:  sensory-reasoning-motor cycle.

Human:  sensory-induction-abduction-deduction-motor cycle.

Higher organisms include all the capabilities of the lower forms.








 

References

Paper on analogical reasoning by Sowa and Majumdar:

http://www.jfsowa.com/pubs/analog.htm

Paper on ontology, metadata, and semiotics:

http://www.jfsowa.com/ontology/ontometa.htm

Peirce's tutorial on existential graphs, with commentary by Sowa:

http://www.jfsowa.com/peirce/ms514.htm

Selected papers by Peirce on semeiotic and related topics; see his 1903 lectures on pragmatism in vol. 2 for material related to this talk:

Peirce, Charles Sanders (EP) The Essential Peirce, ed. by N. Houser, C. Kloesel, and members of the Peirce Edition Project, 2 vols., Indiana University Press, Bloomington, 1991-1998.

Cyc web sites:

http://www.cyc.com/

http://www.opencyc.org/

WordNet web site:

http://www.cogsci.princeton.edu/~wn/

FrameNet web site:

http://www.icsi.berkeley.edu/~framenet/


Copyright ©2003, John F. Sowa