Saturday, July 10, 2004
Manhattan Project to Integrate Human-centric Information Production
Discussion on:, a common language underlying Readware and InOrb Technology
As you stated, Letter Semantics corresponds to lexical representation through a direct “instrumented” measurement  of the co-occurrence of special letter triples.
It is vital that the analyst understand the origin and justification for each of the Readware letter triples. Questions like, “why three letters”, need to be answered. The answer is likely to involve some demonstration of a bell curve where comparisons are made between different processes, each discovering a set of structural primitives. Then some numerical answer can be given that the Readware construction is optimal in some specific sense.
Perhaps the proper answer is “we found a process of producing three letter tokens that measured, perhaps imperfectly but still quite useful, the invariances in human communication needs.
At this point, it might be useful to use the language of J. J. Gibson and others in the ecological psychology community. They talk about linguistic affordances. As you and Tom have said, the nature of the need to communicate, the physical properties of the human mouth and brain, and other factors are all involved in created a very stable set of substructural elements from which human though, in each case, must arise.
Of course, the expression of individuality occurs as a perturbation from what is otherwise a very predictable situation. The predictability and the perturbation from what would be, otherwise, a purely deterministic flow of events play against each other. Over the long term the stability of the substructural elements is established. One need only look at the atomic periodic table for chemistry to see this substructural stability and the great variability that occurs in the chemical compositions we enjoy.
In any case, the reason why we can talk about a national Manhattan-type project to produce (Human-centric Information Production) HIP, is due to the possibility that a simple computer-based methodology may map the substructure any type of complex phenomenon. The possibility of mapping is due to an “ontological claim” that a fixed set of affordances shape the behavior of any type of complex system. The formation and presence of specific types of terrorism cells is then brought under the light of new information tools. The War on Terrorism, the War on Drugs and several other types of social-political conflicts would be provided a science that measures complex phenomenon and would therefore provide a PUBLIC viewing of relevant information about many types of threats to the democracy.
The presence of HIP technology tools, as open source software, will allow the application of HIP techniques to complex manufacturing processes, such as the production of foods and medicines. The primary purpose of the National Project is thus to create the HIP technology as public domain technology and to provide a new K-12 curriculum in mathematics and computer science.
The answer to the question, “why three letter tokens” is then placed into the context of creating a measurement device. A different set of tokens might have been discovered, but the ontological claim is that there is something “external” to the measurement that is being measured.
As I will propose in a later bead, , we can also provide to the public a notational system and theory as to why our theory of stratified semantics has a correspondence to how the real world works.
Before any notational system and underlying explicit theory is developed scientists work with intuitions. My work and Tom’s work is very rigidly grounded into a “personal metaphysics” and a specific way of thinking. Both he and I will give up favorite terminology as we get some theoretical integration settled. We need to justify stratification theory, and create some way to expose the specific validation exercises that he has developed.
You say that “Letter Semantics is the measure.”, but saying this does not expose the detail. Our tutorials will explain precisely what this means. In the tutorials we will develop more detail about what is measured and how the interpretations about function are made.
Stratified theory suggests that the set of letter triples is a substructure for “compounds” of these letter triples. The total set of these compounds then create a measurement device. It is appropriate to make a comparison between how this measurement is made and the well-known algorithms related to latent semantic indexing.
The correspondence between compounds and human concepts is then something that has can be justified based on some set of objective metrics. I will speak to this some more in bead .
Ballard’s work is grounded in an ordered n-tuple,
< r, a(1), a(2), . . . , a(n) >
It is unfortunate that and I often talk about the ordered triple < a, r, b > as the most elementary construction. Ballard is right, but I am dealing with the pragmatics of encoding relationships that are NOT semantic relationships. There is more to say on this.
The letter triples are not in them selves a relationship. You do not have the middle, or first, element being a “relationship”. The “relationships” in the Readware ConceptBase are established via the human empirical work that Tom and you did over the past two decades. The ConceptBase ties together several of the letter triples, in a way that is analogous to the way atoms are composed into chemical compounds.
Examples in the tutorials are therefore required to allow the expert user to understand the way our conceptual rollup engines work. These examples are the only way that the technology will find a market, because the conventional wisdom marginalizes both the notion of substructural semantics and the need for human-centric information production about the function of observed aggregations of substructure (see Figure 6 - from Dimtri Pospelov’s unpublished book in Section 6 of my Chapter 2 in “Knowledge Foundations”.)
Some formal description of the set of letter triples needs to be made and justified. A precise number is important in this sense, because we need to reduce the uncertainty as to what one is talking about. We have to separate the details of description so that nothing is left ambiguous in our description.
We are in agreement that our two companies may now, if funding is found, lay out an objective notation is vital to our designing applications based on the merging of the technologies. The Readware Provenance ™ product is the first of a number of vertical applications that we can promise to investors.
But unless expert humans perform tasks related to the Provenance ontology services, the pollsters will never be able to use the pre-poll results that we make possible. So the educational process has to occur, and before this educational process can take hold there has to be a common language being used by Ballard, Sowa, you, others and I.
You said: “Letter Semantics gave us the semantic distance between variable signs that make use of these representations for their encoding. “
I have a principled argument that this is not the correct language for description.
I do not see what any concept of “semantic distance” can relate to. This is because the concept of distance does not apply to the native notions of relationships between concepts. In fact the only concept of “semantic distance” that does apply is that relationships, as specific relationship types, are members of a set of specific relationships that when aggregated together creates an experience of the concept, a specific concept. So there is both a theory of type and a theory of substructure.
Meaning does not have a distance. Not in reality.
But the phrase “semantic distance” is attempting, poorly, to signify something. I think that this something can be described better using Maturana and also Stu Kaufman’s terminology. I will talk more about this in bead .
The naturally occurring concept has to be internally “related by co-occurrence” since this is something that can be measured as being there or not. This is why InOrb notation uses the language “subject matter indicator”. Of course, co-occurrence is a crude measure, but with visualization and human reification this measure may be the next best step. Ballard’s work may be beyond this step.
And we need to make clear what “variable signs” means. Ballard and others need to know that you mean. Why does a sign vary? What is a sign?
You said : “The measurements for all known concepts (words in the lexicon) are pre-computed and indexed to the general ConceptBase. The distance for a new concept is computed as it is detected in a system start and the indexes to the (new) conceptbase is re-compiled. “
We need to give a full explanation of a concept that is clear in which people can become comfortable with.
The pre-computing of the “known concepts” is why this all works, and we have a lot here to talk about. There are some formal constructions that will make this “landscape of concept representations” very clear.
As we get the vagueness out of the language being used, we will find new science that everyone should be looking for.
Clearly again, the Precision/Recall metrics that you have demonstrated are impressive, and yet the observation of good Precision/Recall metrics does not prove an underlying theoretical construction. Finding a common theoretical construction and making it available to others has to be the measure of success for our project in the IC.
 By this we mean the total process of parsing and applying rules to the parsing of text. In some sense, this is natural language processes by computer programs… even if how Readware does this is not exactly the same as traditional natural language parsing – as found in the literature.