[32]                               home                            [34]

 

Communications on a National Project

Taxonomy Issues

 

3/14/2004 7:05 PM

 

Some discussion has occurred between Ken Ewell (ReadWare Inc and Paul Prueitt (OntologyStream Inc) over the past week.  From this discussion we have identified some core concepts that can be discussed within the community of scholars.

 

The following text is complex and yet concise.  The issues are laid open very well.  Paul Prueitt makes comments form hyperlink tags within the “[]” brackets.  Anyone wishing to make a comment of this type is free to do so, just send a note to portal@ontologystream.com or write an independent communication about where the discussion on taxonomy and the National Project have been.

 

 

Hello Paul (Prueitt),

 

I appreciate your comments and I thank you for opening up the debate to include some discussion of the architecture and methods implemented in Readware. I think it is important to mention that Dr. Adi's computational studies centered on complex control systems and his dissertation was on "interface correctness" and the problems faced in compiling code.  It is probably also useful to mention that the extent of our research is in the structural analysis of messages and texts; analyzing the relations in dynamically formed text structures as in query and retrieval systems.  [%]

 

The "letter-semantic" primitives of Adi's model are a kind of perfection, but they are certainly not all there is. Plato believed in some perfections and "beauties of the earth" and in using them as "stepping stones to that greater beauty".  We believe that the phonetic alphabet is one of those beauties.  It is also one of those things that was legislated into human memory, sort of like how John Sowa described how meaning is legislated though UPC codes that might not even make sense to someone.  Plato was chiefly responsible for the former legislation.  Even there though, where they are like chemical compounds and partial thought-atoms combined to form words, they are but conjecture seeking a resolution in natural agreement (e.g. truth/correspondence) or through legislation as we are by now well-aware. [&]

 

This model did solve, for us, the problem of producing a framework of measures that were not context-dependent, measures that could be applied in any context, in any subject domain.

 

As an oppositional scale, Readware's bipoles are neutral to subjective situations like whether up is indeed down and they are useful to every situation.  In other places Tom Adi refers to them as "binary activators" as he sees them as active participants in a given situation.  In truth we cannot determine their significance in advance. The significance is determined dynamically for every query put to every collection in our axiomatic system of computation.  The Readware algorithms take snapshots and make measurements where the results of each measurement are tallied, stored and used in a relevance competition that is not decided until all the measurements are in. We happen to do this very, very quickly.  [^]

 

Still, as you said, we also found this to be insufficient. [^^]

 

Yet we found a formalism and a very good one with a strong axiomatic basis. [^^] We were in need of some clarity about the forms and structures that were apt to be found in texts. 

 

Language change made every morphological or phonetic analysis difficult, particularly between languages.  In 1986 we developed some algorithms to analyze text documents and messages.  By 1987, this grew into a DOS-based (command line) research program called The Research Assistant.  The program dealt with morphemes and verbal expressions and processed queries against texts to produce a numerical score or relevance; later it could highlight pertinent spots.  When it was used to score a passing grade on an LSAT reading comprehension test we were encouraged to take the next steps. { extended comment }

 

Because of language change and what you call the arbitrary element of every situation, our program based on the formal model alone had problems.  We could only guess at what might be called natural kinds, in our case-- word-roots, of the kind that can link ideas like that represented by the term "place" to occurrences using the terms: plant, replace and supplant, transplant and emplacement. We had good and bad guesses.

 

The Readware Research Assistant could not measure the difference between trains and trends, for instance; it was idea-stupid.  We needed to specify and define the naturally occurring kinds of ideas.  This became necessary so we could more easily identify them in arbitrary texts and message streams.

 

Because *if* our word roots can be relied upon, they can be used as nodes to specify linkages, for regulating the relations we were capable of discovering and for increasing clarity and precision.  We were looking for the more than the one hundred great ideas AI'ers were fond of using. 

 

Dr. Adi performed a taxonomic analysis of the Arabic language {additional detail on Arabic origins} and this taxonomy became the basis of the Readware ConceptBase that was initially released to the public in 1992.  Only the Germans with their strong opinion on "worldview" applied the technology over a spectrum of domains from agriculture to medicine, politics and on to domains social commentary, science and technology.  We reformed the ideas and the Readware ConceptBase in 1996.  Since then we have added less than one thousand terms to the original ConceptBase and reformed the rules of German spelling corresponding to their government's adopted spelling reform.

 

Since adding the ConceptBase, the model itself became less important, more ubiquitous. It's purpose after all, was to render measures of "semantic distance" or fidelity. 

 

To obtain these, we first measured all the relations between the nodes in the concept base for a baseline.  This all makes it possible to model expected text structures within well understood contexts.  It gives some clarity.  Still the concept base with its broad coverage across the entire dominion of human knowledge, as represented in a language (the Arabic language) was not enough to capture the variety of representation and relevance we found in texts of all kinds. 

 

Eventually, circa 1999, we began to create still another layer of linkages that exists not between one word or another, as the term issue might relate to the term topic or to an emission of some kind, but between groups of ideas, or "themes".

 

In these groups, the terms representing ideas will occur in specific configurations where they are pertinent and in other configurations where they are not.  In turn, these can be layered in a taxonomy where further (independent) relations can be specified and defined.  Incidentally, we call this layer and the specification itself the "culture".  This layer, unlike the inaccessible framework and the more strictly modifiable concept base is an open layer.

 

As an open layer, it is possible to define the topics, categories, and issues one may want to identify and relate, in terms of the language you are expecting to see referring to those items. 

 

In summary, Readware has three layers of representation.  These are the letter-semantic measures based upon the mapping of thoughts-prototypical memories-letters, the "concept base" that serves to link the ideas of thousands of recurring root words found in languages, and the user-modifiable cultures that serve the user to specify and define the terminology and characteristics particular to the occurrence of particular ideas, or the themes of particular ideas, in which they are interested.

 

Incidentally, after developing the concept base, we found that the root-words we had entered into the concept base, and the corresponding English and German lexicons, etc. accounted for an average 30% of the terms found in millions of articles, messages and texts. Another 30% of the items parsed were names of people and places and 30% was unique or unknown terms (that are treated as constants). This gave us an indexing idea that gives the Readware Information processor its speed.  It belies the fact that in fact we have to do a lot of work for each context as we have to dynamically create snapshots and make measurements. 

 

What we did not do and have not addressed is the kind of inferential reasoning that is addressed in some of your work and in Ballard’s work.  I think Readware can be a good foundation that will help anyone extract information and bracket spots of texts and messages to reason upon. 

 

I think the schemas in your

 

http://www.bcngroup.org/beadgames/taxonomyDiscussion/thirtyone.htm

 

can be represented in Readware cultures and that we might find some ways to apply Ballard's theories as generalized queries for supporting spots of texts

 

I hope you can add this to the beadgames as a more complete representation of the layers of knowledge representation within the ReadWare framework. 

 

Regards,

Ken Ewell