ORB Visualization

(soon)

Tuesday, June 01, 2004

First Tutorial on OrbSuite™

Nathan began to re-express the Orb development software. The complete description of the Orb development process is given in the notational paper. The primary limitation that OntologyStream had on the previous version of Orb generation and viewing was in the iterated modification of a Stop Word, or Go Word, list. The Go Word list indicates the set of words that will be left after the removal of all words not listed. We will call a Go-Word list that has been used to develop an Orb preliminary taxonomy list.

Orbs are developed to show the occurrence patterns of words in text. The original files are re-written so that only these words from the go word list are placed into word level n-gram containers (called Orb subject matter indicator neighborhoods).

The tutorials on these concepts are developed in the BCNGroup Glass Bead Game thread: InOrb, and in some early descriptions of work on Upper Taxonomies for the FCC.

OntologyStream’s new work on OrbSuite™ is designed to provide the minimal functionality to iteratively develop Upper Taxonomy for subject matter identification and retrieval using preliminary taxonomy lists. Preliminary taxonomy can be evolved into a controlled vocabulary and organized into a two level upper taxonomy. Orb Suite™ assists humans in the generation and maintenance of controlled vocabulary and upper taxonomy.

We at Ontologystream are continuing to develop the underlying technology for the easy use of Orb conceptual discovery and indexing. Orb conceptual discovery has been integrated with other techniques such as rule based scripts and latent semantic indexing. These other techniques discover subject indicators through feature extraction and clustering, and once discovered independent Orb sets, in the form of a set of triples

{ < a, r, b > }

are used to instrument a non-statistical recognition of subject indicators occurring in natural language. Categories of relationships are encoded into the Orb constructions and are available as easily manipulated situational ontology.

In the next tutorial we will index two fables and then iteratively adjust the taxonomy so that the subject indicator topology separates the scatter gather clustering of co-occurrence pairs of word occurrences. This is done to illustrate how single “subjects” can be instrumented from measurement using small Orb ontologies and simple rule sets.

The concept of a subject indicator topology applies also to measurements of none linguistic data using the notion of a framework. Prueitt’s generalFramework (gF) theory allows for a event substructure, or features, measurement using one or more dimensions of description. For example, the 5*6 Zackman framework has the two dimensions

{ what, how, where, who, when, why } and

{ planner, owner, designer, builder, subcontractor }

The cross product of these two description-lists gives 30 primitives. John Sowa’s 2*2*3 framework provides 12 semantic primitives. Non-linguistic data is acquired using a framework by assigning states to the cells of the framework. Once the data is acquired one develops a controlled set of tokens, like a controlled vocabulary, and constructs formal control ontology based on a differentiation of types as determined by various methods. These methods are in the process of being discovered, and rely on a technology we call Orb arithmetic.

Like the SLIP browsers, the new OrbSuite™ browser is a standalone (does not require installation) Visual Basic executable (116K) running only in Windows. The data encoding is fractally scalable so that the more data one wishes to encode the less costly the new data is in terms of memory allocation and retrieval speeds.

This version is a proto type design and will be redeveloped, when the first round of significant funding occurs, using Python plus PHP code to run on any operating system. The redevelopment may use and license a number of additional technologies, including the Hilbert engine from Prementia Inc, VisualText parsing engine from Text Analysis International Corporation, SchemaLogic schema server from SchemaLogic Inc, Polylogic server from Pile Systems Inc, and NdCore technology from Applied Technical Systems Inc. However, each of these technologies will fit, or not fit, depending on the partnering relationship that has developed.

Individual discussions regarding the value added by each of these technologies can be made under a limited and target NDA with OntologyStream. As the first round of funding occurs, we will select some of these auxiliary technologies to be incorporated into the Knowledge Sharing Core concept, as a base for curriculum development on the foundations of Anticipatory Web technology.

The OrbSuite™ browser needs a Projects folder to be located in the folder that the OrbSuite™ is placed into.

Figure 1: The OrbSuite browser

The OrbSuite has a main screen with four tabs; Project Overview, Project Options, Results and Orb Search.

Figure 2: The main screen of the OrbSuite Browser

Under the File tab one can create a new project. The screen for selecting the path to the text collection and making a title and comments is seen in Figure 3.

Figure 3: The New Project Screen

On return to the main screen (select the “Done” button”) one is able to create or re-create an Orb. First however, one should develop or assign a Stop or Go list. This is done using the second tab, “Project Option” in the main screen.

Figure 5: The Project Options screen

In Figure 5 we have selected the Project Options tab and we have also imported a stop list having several thousand words. (We will get to specific numbers in a future edit of this bead. Also the go-list will be automatically generated given a stop-list. The reason for this is that the go-list itself will evolve to be controlled vocabulary and taxonomy. The Orb construction for a controlled vocabulary will in some cases be made interoperable with Ontology Web Language and Topic Maps standards. So one will be able to import a go-list from an existing OWL ontology, or bootstrap an OWL construction using a stop-list and the Orb development process. The ability to generate OWL construction from text resources is highly dependant on human manipulation of the Orb construction process. However, this process is highly intuitive and will, in theory, allow for automated rapid development of situational ontology.

In Figure 5 we have selected create/re-create and produce an Orb over ten of the 312 Aesop fables. This produces 282 “concepts”.

Figure 5: The main screen after 282 Concepts are identified

These 282 concepts are arranged in tree structures and available for inspection from the third tab, “Results”.

Figure 6: the Results tab

The production of Orb constructions is quite easy using the OrbSuite. In Figure 7 we have created a new project, assigned the VisualText stop list, and targeted a folder with a single fable having 152 occurrences of words.

The go list is to be manipulated by various tools to produce a preliminary taxonomy list. When this list is further developed into a two layer Upper Taxonomy then our primary functional objective for this software will have been achieved. We are four weeks away from demonstrating this primary functionality in a robust set of prototype Visual Basic browsers. Redesign and redevelopment of this capability into Python and PHP is estimated to take one month.

Figure 7: The Orb constructions for one fable

In Figure 7 we see part of the tree structure consisting of 48 “concepts” each one of them being the center of a word level n-gram developed by taking the sentence structure away, and the stop words away, and re-writing the 152 word fable as a ordered set of words and then passing the five word window across the ordered set (as discussed in the Orb Notational Paper)

Figure 8: The same 48 “concepts” viewed using the SLIP browsers

The “wolf” concept is visualized in Figure 9. In the Orb Notational Paper we address the issues of a separation of the elements of the subject indicator neighborhood to allow a disambiguation of context. One should remember that natural language has adapted to the ambiguity of subject indicators and acts with human cognitive capability to evoke mental experience. The Orb subject matter indicators function in precisely this fashion. Thus one way to separate the element “voice” from the element “tasted”, in the event compound, is by visual inspection and manual annotation. As the next exercise will show, automated means are available to aid in making this separation.

Figure 9: The wolf concept

Appendix: Error to be studied and fixed.

From the eventChemistry browser we find:

Link: wolf

Atoms: voice seized pasture food feed born tasted meeting grass exclaimed drink lamb

From the Orb Suite we find:

Wolf

Lamb, meeting,

Voice, born, feed, pasture

Tasted, grass, drink, exclaimed

Food, drink, seized, ate

There is only one technical error, which we will track down. “ate” should be in the list of atoms for the link “wolf”.

The Wolf and the Lamb .

WOLF, meeting with a Lamb astray from the fold, resolved not to lay violent hands on him, but to find some plea to justify to the Lamb the Wolf's right to eat him. He thus addressed him: "Sirrah, last year you grossly insulted me." "Indeed," bleated the Lamb in a mournful tone of voice, "I was not then born." Then said the Wolf, "You feed in my pasture." "No, good sir," replied the Lamb, "I have not yet tasted grass." Again said the Wolf, "You drink of my well." "No," exclaimed the Lamb, "I never yet drank water, for as yet my mother's milk is both food and drink to me." Upon which the Wolf seized him and ate him up, saying, "Well! I won't remain supperless, even though you refute every one of my imputations." The tyrant will always find a pretext for his tyranny.

The words “food and drink to me." Upon which the Wolf seized him and ate” are converted to the go-list order set

{ food, drink, wolf, seized, ate }

The SLIP is not finding the ordered triple < x, wolf, ate > where x = “food”, “drink” or “seized”.

Link: lamb

Atoms: water tone tasted replied plea mournful meeting grass fold exclaimed drink drank bleated astray wolfs justify insulted eat

Lamb

wolf, meeting, astray, fold, plea, justify, wolfs, eat, insulted, bleated, mournful, tone, sir, replied, tested, grass, drink, exclaimed, drank, water