Previous comments.                     make a comment                        Research note 24

 

 

 

Research Note 26 October 17, 2003

 

Supplemental Report to the REAL Program at DARPA

 

The SAIC – Ontologystream Proposal (September 3rd, 2003)

 

The supplement to the SAIC - OntologyStream REAL proposal is submitted.

 

There is some thing interesting developing in relationship to how localized information is encoded using the class:object concept, and not using the hierarchical structure (trees) that CCM depends on.  

 

To understand the “new paradigm” one has to acquire a full notion of the convolution operator that organizes some or all of a bag of localized atoms, those atoms having the form widely understood as software classes and objects.  The best way to think about the convolution is to think about the Reimann sum that when taken to the infinite limit is the normal integral in college calculus.   A more “physical” way to think about the convolution is to consider the “universal laws” that are involved in produces “localized” event structure in the natural world. 

 

How the atoms are created for differential and formative ontology is the subject of the research and development we continue to work on.  But the generalFramework theory has produced a method described in researchNote 24, Figure 2.  At:

 

http://www.ontologystream.com/beads/frameworks/pondFramework.htm

 

we show how very simple and complete the generalFramework is for encoding localized information.  By specifying the class structure with the well-specified ontological primitives from an highly contextual framework, we create a means to create OWL ontology.  But also in specific “measurements” of the real world, like a fish pond, the value that the class “instantiates” can be determined by direct measurement, using instruments, or by a human written language into the slot that is the framework’s cells.  (I can talk more about this if it is not clear.)  

 

We are discovering that the "new paradigm" for localization and globalization of information takes us directly back to the Peircean node, and Peircean graphs.  I hope soon to start funding time for John Sowa and Steven Newcomb for serious work on this.  This work will be collaborative and not owned by OntologyStream Inc, as we will make it public domain as soon as we think about it.  (This solved the ownership issues, completely.)

 

What I see in the new notation brings with it something that is similar, but extends Liz Liddy's patent on knowledge representation.  This new notation was invented yesterday, as I thought about why CCM did not solve the natural language understanding problem.  The problem is easy to state now.  The problem is that contiguous relationships between term occurrences, as in n-grams, are almost meaningless.  Humans just do not use natural language in this way.  As stated in the SAIC – Ontologystream supplemental statement to the DARPA REAL project, the problem is also (what I often term) the Wittgenstein issue related to “what is not in the text”.  Human tacit knowledge evocation is necessary in the development of a model of the “relational connections” between language terms and ontological elements that are not in the text. 

 

Moreover, nothing about natural language is hierarchical, except the very specific organization separation between class and object, localization and globalization.  The “confusion” that infects modern software and computer science is that the physical stratification involved in natural use of memory and anticipatory mechanisms is falsely attributed to a biological taxonomy type theory of the world.  This theory is false. 

 

To say that taxonomical organizations to machine ontology does not reflect natural ontology is a conjecture at this point and higher controversial. 

 

I am inclined to make this public, and then file a patent application - as an exercise -  so that PTO makes a judgment about the originality of the "new paradigm" on localization and globalization of informational units having the form "class:object". 

 

The concept of an informational unit of the form type:value is a part of the innovation that appears to be the essence of the CCM patent.  But there is also two specific graph transformational methods that are unique, but also incorrect as far as the more powerful form of "localization".  The "property" owned by ATS is related to these specific graph transformations, both of which are linear and "contiguous", so that relationship not in the data cannot be automatically derived. (This is a judgment based on an argument that is difficult to lay out to the layman.)

 

The localization has to form a single "atom" with a set of "valances" that is related to all possible environments of the "atom".  So we have what Burch, Robert (1989), A Peircean Reduction Thesis, the foundations of topological logic. Lubbock Texas, Texas Tech University Press, called the Unifying Logical Vision of Peirce. 

 

which I represent as:

 

http://www.bcngroup.org/area2/KSF/Notation/notation.htm#_Section_3.1:_

 

and which was a large part of the motivation for the Soviet work on Applied Semiotics

 

http://www.bcngroup.org/area3/pprueitt/kmbook/Preface.htm

 

Amnon is familiar with the Liddy patent, because the PTO has indicated that his, Amnon's, patent application is similar to Liddy's.  Of course, PTO was, at that time failing to understand Amnon's patent. 

 

We are in agreement with Amnon on this.  He said (October 17, 2003):

 

What I think you're saying is that the string "John" only exists in the sentences that it occurs in, which serve to define that string.  One can certainly construct such a world, but I would argue that in the real world, texts are grounded by assumptions about the world.  That is, if John is a particular human, then he has many default characteristics, behaviors, motivations, etc. that may never be described in a text, but that are necessary to understanding of text.

 

To understand texts about a person named John to any substantial degree, a machine will have to know much about people and about the world surrounding them.  Unsupervised processing of any number of texts will only get a machine so far, and I argue that it's not very far.

 

This may align with Paul's and others' notion of requiring a man in the loop, and perhaps with the Cyc notion of hand-building the commonsense knowledge to underpin machine processing.  While statistical NLP has been the rage in the past decade or more, I don't see that it has brought us closer to "solving" the NLP problem.