Previous
comments. make a comment Research
note 24
Research Note 26
October 17, 2003
Supplemental Report to the
REAL Program at DARPA
The SAIC – Ontologystream
Proposal (September 3rd, 2003)
The
supplement to the SAIC - OntologyStream REAL proposal is submitted.
There is some thing interesting developing in relationship to how localized information is encoded using the class:object concept, and not using the hierarchical structure (trees) that CCM depends on.
To understand the “new paradigm” one has to acquire a full notion of the convolution operator that organizes some or all of a bag of localized atoms, those atoms having the form widely understood as software classes and objects. The best way to think about the convolution is to think about the Reimann sum that when taken to the infinite limit is the normal integral in college calculus. A more “physical” way to think about the convolution is to consider the “universal laws” that are involved in produces “localized” event structure in the natural world.
How the atoms are created for differential and formative ontology is the subject of the research and development we continue to work on. But the generalFramework theory has produced a method described in researchNote 24, Figure 2. At:
http://www.ontologystream.com/beads/frameworks/pondFramework.htm
we
show how very simple and complete the generalFramework is for encoding
localized information. By specifying
the class structure with the well-specified ontological primitives from an
highly contextual framework, we create a means to create OWL ontology. But also in specific “measurements” of the
real world, like a fish pond, the value that the class “instantiates” can be
determined by direct measurement, using instruments, or by a human written
language into the slot that is the framework’s cells. (I can talk more about this if it is not clear.)
We
are discovering that the "new paradigm" for localization and
globalization of information takes us directly back to the Peircean node, and
Peircean graphs. I hope soon to start
funding time for John Sowa and Steven Newcomb for serious work on this. This work will be collaborative and not owned
by OntologyStream Inc, as we will make it public domain as soon as we think
about it. (This solved the ownership
issues, completely.)
What
I see in the new notation brings with it something that is similar, but extends
Liz Liddy's patent on knowledge representation. This new notation was invented yesterday, as I thought about why
CCM did not solve the natural language understanding problem. The problem is easy to state now. The problem is that contiguous relationships
between term occurrences, as in n-grams, are almost meaningless. Humans just do not use natural language in
this way. As stated in the SAIC – Ontologystream
supplemental statement to the DARPA REAL project, the problem is also (what I
often term) the Wittgenstein issue related to “what is not in the text”. Human tacit knowledge evocation is necessary
in the development of a model of the “relational connections” between language
terms and ontological elements that are not in the text.
Moreover,
nothing about natural language is hierarchical, except the very specific
organization separation between class and object, localization and
globalization. The “confusion” that
infects modern software and computer science is that the physical
stratification involved in natural use of memory and anticipatory mechanisms is
falsely attributed to a biological taxonomy type theory of the world. This theory is false.
To
say that taxonomical organizations to machine ontology does not reflect natural
ontology is a conjecture at this point and higher controversial.
I
am inclined to make this public, and then file a patent application - as an
exercise - so that PTO makes a judgment
about the originality of the "new paradigm" on localization and
globalization of informational units having the form
"class:object".
The
concept of an informational unit of the form type:value is a part of the
innovation that appears to be the essence of the CCM patent. But there is also two specific graph
transformational methods that are unique, but also incorrect as far as the more
powerful form of "localization".
The "property" owned by ATS is related to these specific graph
transformations, both of which are linear and "contiguous", so that
relationship not in the data cannot be automatically derived. (This is a judgment
based on an argument that is difficult to lay out to the layman.)
The
localization has to form a single "atom" with a set of
"valances" that is related to all possible environments of the
"atom". So we have what Burch,
Robert (1989), A Peircean Reduction Thesis, the foundations of topological
logic. Lubbock Texas, Texas Tech University Press, called the Unifying Logical
Vision of Peirce.
which
I represent as:
http://www.bcngroup.org/area2/KSF/Notation/notation.htm#_Section_3.1:_
and
which was a large part of the motivation for the Soviet work on Applied
Semiotics
http://www.bcngroup.org/area3/pprueitt/kmbook/Preface.htm
Amnon
is familiar with the Liddy patent, because the PTO has indicated that his,
Amnon's, patent application is similar to Liddy's. Of course, PTO was, at that time failing to understand Amnon's
patent.
We
are in agreement with Amnon on this. He
said (October 17, 2003):
What I
think you're saying is that the string "John" only exists in the sentences
that it occurs in, which serve to define that string. One can certainly construct such a world, but I would argue that
in the real world, texts are grounded by assumptions about the world. That is, if John is a particular human, then
he has many default characteristics, behaviors, motivations, etc. that may never
be described in a text, but that are necessary to understanding of text.
To
understand texts about a person named John to any substantial degree, a machine
will have to know much about people and about the world surrounding them. Unsupervised processing of any number of
texts will only get a machine so far, and I argue that it's not very far.
This may align
with Paul's and others' notion of requiring a man in the loop, and perhaps with
the Cyc notion of hand-building the commonsense knowledge to underpin machine
processing. While statistical NLP has been
the rage in the past decade or more, I don't see that it has brought us closer
to "solving" the NLP problem.