[1]                             home                           [3]

Sunday, April 10, 2005

 

Global Information Framework

 

The Readware Semantic Extraction component

 

 

See the progression of the demonstration of Readware-style knowledge extraction at:

 

http://www.readware.com/cbp.html

 

The ConceptBase is a Readware glossary of 2,750 ontological structural definitions. Each definition is associated with several words of every language we have processed so far (English, German, French). Definitions are expressed in a few dozen abstract terms that are interpreted into hundreds of possible interpretations. Each interpretation is a concept structure expressed in simple language and has two cognitive functions:

 

a)     to organize vocabulary in the human mind, and

b)    to suggest query structures for knowledge extraction.

 

Query structures are realized using the Readware query language.  Up to now we have not developed a user interface that fully exploits this language.  We are also aware that the three part Adi structural ontology papers suggest that

 

Many computer scientists and engineers have asked us to take some common base of information and produce a presentation that shows how Readware is used as a Language Reference Model and Knowledge Extraction tool for ontological projects.

 

We define an ontological project as a study of entities and relations in data or information.  We have performed many indexing projects, on arcane texts particularly on Aesop's fables, the bible and American Indian texts and Civil War texts.  These projects have allowed us to refine our internal use of the structural ontology that exists within Readware.

 

While discussing the requirements for semantic extraction tools for ontologies, Dr. Paul Prueitt, founder of the Behavioral Computational Neuroscience Group suggested a collection of administrative rulings as a sophisticated and modern real-world example of information that is to be gained in an ontology project.

 

Due to the need to aggregate information into a few categories, Prueitt suggested an interrogatory framework for investigation that he has called an event Structured Ontology Framework (e-SOF).

 

While Readware does not have visualization tools, it can extract the semantic elements (entities and their relations) out of which the e-SOF emerges.  This can be seen in the demonstration. 

 

The demonstration also allows ad-hoc experimentation via the construction of different tuples:

 

<why,people,time point>,

<how,goods-item,time point>

 

and derivatives, e.g.,

 

<why,goods-item,where>

 

to extract from the file contents.

 

In our work with Dr Prueitt, we added some additional categories and topics to the standard commercial release of Readware.

 

The tuple is used as a query.  The Readware functions exposed here perform a paring process over the 10,574 letters to retrieve those administrative rulings whose contents conform to this tuple.

 

Clicking the title of the document will display the contents of the letter, highlighting relevant items.  You should be able to remember the tuple and recognize how the data fits the tuple. 

 

Dr. Prueitt’s e-SOF organizes the co-occurrence of responses to the set of 18 questions.  This work is still a bit in the future, but has it’s foundation in the formalisms Prueitt invented called categorical abstraction and event chemistry. 

 

As this work is completed, we will post an online presentation of Readware's capability to perform knowledge extraction from plain text, including the preparation of the topics and classes of things it takes to undertake such a study. 

 

Any individual should be able to use the online interface to study the contents of these administrative rulings in the context of who the importers are, what they are importing and how customs officials classify goods and deal with classifications and the disputes and controversy that arise as a result.

 

One should be able to spend an hour here, and come away feeling somehow informed of the details of commercial goods classification.  In this case, our vision for a product of Readware technology is achieved.  If you do not begin seeing results and becoming informed after ten minutes, call me at 1-352-371-5931 (speak up if a machine answers) I may still be reachable.  I can walk you through some sampling exercises.

 

The importance of four factors in this demonstration should not be underestimated.

 

a) The raw costs to produce this study for these 10,574 pages was roughly $0.30/page not including IT resources.  This cost is expended as hand re-definition of parsing processes.  No annotation to source data is required.  No pre-processing or modification to source data is required. 

b) This collection can be extended to 100,000 or more documents at no additional cost, except storage and IT resource overhead. 

c) The schema, classes, entities, topics-- everything--  can be further refined, modified, extended without reprogramming. Automatic compiling and incremental indexing are features of Readware. 

d) While the demonstration hardy made use of concepts in the Readware ConceptBase, the use of the ConceptBase as a language reference model allowed us to focus on the information extraction task without overriding concerns about the vocabulary. The software infrastructure

 

(space+knowledgebase(cultures) +intelligent structures (queries,text))

 

made it straight-forward to implement.  It should be clear that Readware algorithms compute comparisons between structures.

 

Some changes to the topic tree have been effected since it was first posted online.  An explanation to ground the presentation has been added as information.

 

We still plan to sort the topic list with each result so that each class falls together, then we will leave it in this state and publicly available as we move to other projects that demand our attention.

 

Scientists and engineers should take note of the reasoning Readware performed on the indication of a date, e.g.: what is the past, present or future?  The subsumption can be seen in the culture files (culture 5 mainly) in the topics specifying the past, present and future.  It is not the only example or type of reasoning in this presentation.

 

For AI and reasoning advocates, there are many examples of horn-clause logic as it is extensively used in setting the context for the inclusions and exclusions necessary to the topic specification logic. This is similar to the DL of OWL(OIL), etc..  Those knowledgeable of the art should be particularly interested in examining the structures of the relevant Readware culture files (the knowledgebase about the language of customs letters).

 

See it, try it all at: http://www.readware.com/cbp.html

 

Ken Ewell