Tuesday, February 01, 2005
Semantic
extraction over sets of concepts, using InOrb technology
2/2/2005 9:34 PM
last edit
Version 1.2
Parts I – III of the Adi structural ontology.
Center of Excellence, proposal
New Tutorial on Orb technology à
Orb Data Relationship Toolkit (Orb-DRT):
Users Manual
InOrb Technologies
Nathan Einwechter
Introduction
The Orb Data Relationship Toolkit (Orb-DRT) is designed to be a test bed for using the Orb technology for delimited data of various types. We are considering cyber data, scientific instrumentation data and free form text data. The main functionality of the software is to allow the user to create very defined Orbs and export them into SLIP format for visualization and categorization.

The control screen for the Orb-DRT (download)
Creating a New Project
To create a new project, simply click File > New.
Once the-create-new-project dialogue comes up, you may enter a project name and the author’s name. You may now select whether you want to process a single file, or multiple files. Click the radio button beside the appropriate option to select it, and then press Select to specify which file/directory you wish to include in the project.
Any files specified here are placed into the “Projects/project name/Text” directory of the Orb DRT folder. Files can be added later by simply copying or moving the files you wish to include into this directory.
Once you have specified the file or directory you wish to process, the program will provide a two line sample of the data. You may specify how the data is delimited at this point, and the elements list will propagate itself. You may also rename any of the column names, and use the check box to specify whether you want the first line to be processed or not, which is useful if the first line of the files are column titles.
You can then click on “Done” to finalize the column definition options, and “Create” to initialize the project.
Element Tags
In order to tell the difference between different elements with the same value, a tagging system has been developed within the Orb-DRT.
To assign a tag to a specific Element, simply select it from the Elements list, then type in the tag to associate with it, and click apply. The process is the same to modify and view a tag for an element. Simply click on it, view/change the text in the Tag text box and click apply if you made any changes.
It is important to note at this point, that any elements that may have similar values must have a tag associated with them; otherwise the values will merge between elements as the system relies on this tag to differentiate between the various values original elements.
<We need an example here with screen shots.>
Rules Processing
In this initial version of the Orb DRT, the Rules processing has been put on the back-burner. Down the road; however, the took kit will allow certain values and rows to be eliminated from Orb processing transforms based on simple rules defined by the user through a rule wizard.
< We need, Definition of an Orb processing transform: The formalism I have in mind is not completely thought through. >
Centers and Relationships
The Orb DRT is flexible in the way it allows the user to define Orb relationships. There are two types of definition processes, the measurement process and the convolution, or process transform.
The measurement can be from
Readware, or
InOrb word level n-gram analysis,
with other methods soon to come.
Orbs generalize from a tree structure (using the CCM patents – 1992-1994) derived from the “word level n-gram”, used in NdCore technology, to the notion of a center and elements in a neighborhood (all elements of an n-gram window have the same initial “distance” from the center.)
In fact the notion of distance is weaken in the deeper theory where only set membership and category theory is present {1} {2}. The use of convolution is based on the notion that a process touches each element of a set. When touched, the element may trigger something, or some subsetting function may occur.
Centers are those elements that one builds relationships with by gathered to the center those measured structures (patterns or atoms). This process stands up event or concept indicators and thus can be reified as a set of RDF triples, RDF statements, and equipped with some inference mechanisms (OWL). Or one can reify the indicators as elements of a formative topic map.
Using the measurement process definition one may define many centers.
To define a center, simply select “it” from the drop-down list then click on the “Add” button. To remove them, simply select the element you wish to remove from the center list and click on the remove button.
<we need
screen shots here. The “it” is a cA,
not an instance of a pattern, atom or word phrase.>
You will notice that the center list is displayed as a checklist.
If you are not using the Second Level Orb option, as discussed in Encoding Types and Second Level Orb, than you can simply ignore the checkmark portion of this list, as it is irrelevant.
When you are using this option, however, the checklist allows the user to define which center elements are “keys”. These keys are the elements that are ultimately the only ones within the encoded Orb. More details about their function are discussed in the next section.
For each center, you must define the elements that you wish to have related to the center. To do this, highlight the center of interest from the center list. Once this occurs, the two lists beside it (Available Elements and Related Elements) will propagate to show the current elements that are available to be related and those which are already related to the center.
To add a relation, simply highlight the element you want to relate to the center, and click the “>” button.
To add all elements, simply press the “>>” button. To remove a single relationship, simply highlight the element you wish to remove and click the “<” button. To clear the relationship for the highlighted center, simply click the “<<” button.
Encoding Types and Second Level Orb
Currently, the Orb DRTK software allows two different encoding types and an plus an extra encoding aggregation method.
The two types are Classic Encoding and Compressed Encoding.
Classic Encoding – This is the typical encoding type that has been used in the past, where duplicate neighbors are allowed. This preserves the exact number of relationships between a specific center and each of it’s neighbors, but takes up extra space.
Compressed Encoding – This is a new form of encoding which only allows one instance of any given neighborhood to exist in relation to a specific center. This reduces the size of the orb in memory, and can help with increasing processing time.
The extra data aggregation method is being described, at this time, as a Second Level Orb. This operation consists of two stages. The first stage is to complete the Orb encoding as usual. Once this encoding is complete, the second stage kicks in. The second stage will take a look at what elements have been flagged to be “keys” as described in Centers and Relationships.
The neighborhoods for each of the centers belonging to these key elements are collected.
Each entry in the neighborhood is expanded into that center’s neighborhood.
To illustrate this we take an example.
First stage;
me | there, is, no, more
there | me, no, more, is
is | no, more, there, me
more | there, me, no, is
If we say that the keys are “me” and “there”, then the second stage creates;
me | (me, no, more, is), (no, more, there, me), no, (there, me, no, is)
there | (me, is, no, more), no, (there, me, no, is), (no, more, there, me)
The encoding type can be selected by clicking on the menu option Orb Tools > Encoding, then select the encoding type.
You may also select whether or not to use Second Level Orb processing by placing or removing the checkmark beside “Second Level Orb” under Orb Tools.
Creating the Orb
Once you have defined the tags, centers, relationships and keys (if applicable) you can simply click on Orb Tools > Create Orb to start the Orb creation process.
Exporting to SLIP Format
Once the Orb is created, you can export the results to SLIP Datawh format by clicking on Orb Tools > Export To SLIP. This can only be done after the Orb has been created. The export procedure creates a Datawh.txt file of the Orb in the project’s main folder.
Down the Road
Ultimately, this system can also be modified to complete the same type of operations, but over parts-of-speech tagged text, or other datasets/text which have specific classes/elements associated to various values.
Ultimately, this technology will be merged with our previous OrbSuite natural text technology to allow many different forms of operations to occur on many different types of data.
Nathan