Communications on a National Project
3/14/2004 7:05 PM
Some discussion has occurred between Ken Ewell (ReadWare Inc and Paul Prueitt (OntologyStream Inc) over the past week. From this discussion we have identified some core concepts that can be discussed within the community of scholars.
The following text is complex and yet concise. The issues are laid open very well. Paul Prueitt makes comments form hyperlink tags within the “[]” brackets. Anyone wishing to make a comment of this type is free to do so, just send a note to portal@ontologystream.com or write an independent communication about where the discussion on taxonomy and the National Project have been.
Hello Paul (Prueitt),
I appreciate your comments and I thank you for opening up the
debate to include some discussion of the architecture and methods implemented
in Readware. I think it is important to mention that Dr. Adi's computational
studies centered on complex control systems and his dissertation was on
"interface correctness" and the problems faced in compiling
code. It is probably also useful
to mention that the extent of our research is in the structural analysis of
messages and texts; analyzing the relations in dynamically formed text
structures as in query and retrieval systems. [%]
The "letter-semantic" primitives of Adi's model are
a kind of perfection, but they are certainly not all there is. Plato believed
in some perfections and "beauties of the earth" and in using them as
"stepping stones to that greater beauty". We believe that the phonetic alphabet is one of those
beauties. It is also one of those
things that was legislated into human memory, sort of like how John Sowa
described how meaning is legislated though UPC codes that might not even make
sense to someone. Plato was
chiefly responsible for the former legislation. Even there though, where they are like chemical compounds
and partial thought-atoms combined to form words, they are but conjecture
seeking a resolution in natural agreement (e.g. truth/correspondence) or
through legislation as we are by now well-aware. [&]
This model did solve, for us, the problem of producing a framework of measures that were not context-dependent, measures that could be applied in any context, in any subject domain.
As an oppositional scale, Readware's bipoles are neutral to
subjective situations like whether up is indeed down and they are useful to
every situation. In other places
Tom Adi refers to them as "binary activators" as he sees them as
active participants in a given situation.
In truth we cannot determine their significance in advance. The
significance is determined dynamically for every query put to every collection
in our axiomatic system of computation.
The Readware algorithms take snapshots and make measurements where the
results of each measurement are tallied, stored and used in a relevance
competition that is not decided until all the measurements are in. We happen to
do this very, very quickly. [^]
Still, as you said, we also found this to be insufficient. [^^]
Yet we found a formalism and a very good one with a strong
axiomatic basis. [^^] We were in
need of some clarity about the forms and structures that were apt to be found
in texts.
Language change made every morphological or phonetic analysis
difficult, particularly between languages. In 1986 we developed some algorithms to analyze text
documents and messages. By 1987,
this grew into a DOS-based (command line) research program called The Research
Assistant. The program dealt with
morphemes and verbal expressions and processed queries against texts to produce
a numerical score or relevance; later it could highlight pertinent spots. When it was used to score a passing
grade on an LSAT reading comprehension test we were encouraged to take the next
steps. { extended comment }
Because of language change and what you call the arbitrary element of every situation, our program based on the formal model alone had problems. We could only guess at what might be called natural kinds, in our case-- word-roots, of the kind that can link ideas like that represented by the term "place" to occurrences using the terms: plant, replace and supplant, transplant and emplacement. We had good and bad guesses.
The Readware Research Assistant could not measure the
difference between trains and trends, for instance; it was idea-stupid. We needed to specify and define the
naturally occurring kinds of ideas.
This became necessary so we could more easily identify them in arbitrary
texts and message streams.
Because *if* our word roots can be relied upon, they can be
used as nodes to specify linkages, for regulating the relations we were capable
of discovering and for increasing clarity and precision. We were looking for the more than the
one hundred great ideas AI'ers were fond of using.
Dr. Adi performed a taxonomic analysis of the Arabic language
{additional detail on Arabic
origins} and this taxonomy became the basis of the Readware
ConceptBase that was initially released to the public in 1992. Only the Germans with their strong
opinion on "worldview" applied the technology over a spectrum of
domains from agriculture to medicine, politics and on to domains social
commentary, science and technology.
We reformed the ideas and the Readware ConceptBase in 1996. Since then we have added less than one
thousand terms to the original ConceptBase and reformed the rules of German
spelling corresponding to their government's adopted spelling reform.
Since adding the ConceptBase, the model itself became less
important, more ubiquitous. It's purpose after all, was to render measures of
"semantic distance" or fidelity.
To obtain these, we first measured all the relations between
the nodes in the concept base for a baseline. This all makes it possible to model expected text structures
within well understood contexts.
It gives some clarity.
Still the concept base with its broad coverage across the entire
dominion of human knowledge, as represented in a language (the Arabic language)
was not enough to capture the variety of representation and relevance we found
in texts of all kinds.
Eventually, circa 1999, we began to create still another
layer of linkages that exists not between one word or another, as the term
issue might relate to the term topic or to an emission of some kind, but
between groups of ideas, or "themes".
In these groups, the terms representing ideas will occur in
specific configurations where they are pertinent and in other configurations
where they are not. In turn, these
can be layered in a taxonomy where further (independent) relations can be
specified and defined.
Incidentally, we call this layer and the specification itself the
"culture". This layer,
unlike the inaccessible framework and the more strictly modifiable concept base
is an open layer.
As an open layer, it is possible to define the topics,
categories, and issues one may want to identify and relate, in terms of the
language you are expecting to see referring to those items.
In summary, Readware has three layers of representation. These are the letter-semantic measures
based upon the mapping of thoughts-prototypical memories-letters, the
"concept base" that serves to link the ideas of thousands of
recurring root words found in languages, and the user-modifiable cultures that
serve the user to specify and define the terminology and characteristics
particular to the occurrence of particular ideas, or the themes of particular
ideas, in which they are interested.
Incidentally, after developing the concept base, we found
that the root-words we had entered into the concept base, and the corresponding
English and German lexicons, etc. accounted for an average 30% of the terms
found in millions of articles, messages and texts. Another 30% of the items
parsed were names of people and places and 30% was unique or unknown terms
(that are treated as constants). This gave us an indexing idea that gives the
Readware Information processor its speed.
It belies the fact that in fact we have to do a lot of work for each
context as we have to dynamically create snapshots and make measurements.
What we did not do and have not addressed is the kind of
inferential reasoning that is addressed in some of your work and in Ballard’s
work. I think Readware can be a
good foundation that will help anyone extract information and bracket spots of
texts and messages to reason upon.
I think the schemas in your
http://www.bcngroup.org/beadgames/taxonomyDiscussion/thirtyone.htm
can be represented in Readware cultures and that we might
find some ways to apply Ballard's theories as generalized queries for
supporting spots of texts
I hope you can add this to the beadgames as a more complete
representation of the layers of knowledge representation within the ReadWare
framework.
Regards,
Ken Ewell