[30]                               home                          [32]

 4/22/2004 6:53 AM

 

Key questions on Common Upper Ontology

 

 

Communication on a design for low cost thematic analysis of social discourse

 

·            Time required to complete operational Beta: 1 month

·            Full API (Application Program Interface): 2 months

·            Time required to tune operational system for use as stand alone intel: 3 months

 

Total cost: $12,000 per month

 

1.: Project Based Session Saving/Loading

1.1: Individual projects are given their own directory structure for all operations

1.2: Each project has the capability to Export/Import Orbs and Orb Concepts to/from other projects

 

2: Web harvesting of text

2.1: Text Retrieval

2.1.1: Automated retrieval of text from sources found online (HTTP and FTP), as well as local and networked text sets

2.1.2: Text can be extracted from HTML, TXT, PDF, Microsoft Word, and most other standard text formats as required.

2.1.3: Support of compressed text sets also possible

2.2: Optional text archiving on local system

2.3: All retrieved text is logged into an index to allow for future re-retrieval and update checking, as well as other functions within the program

 

3: Text Processing

3.1: HTML Automatically stripped from all HTML files to provide cleaner text sets

3.2: Many options for further text processing

3.2.1: Removal of stopwords as listed by a user defined list

3.2.1: Expansion of conjunctions to their full words

3.2.3: Removal of non-alphanumeric characters

3.2.4: Optional removal of numeric characters

3.2.5: Break up of text into phrase sections according to user defined boundaries (i.e. end of paragraph, specific number of words, etc.)

3.3: API to Text Analysis International’s system for building text analysis systems (optimal)

 

4: Resultant text is optionally archived

4.1: Differential and formative ontology assumes that an interface will be developed to allow standard Latent Semantic Indexing (patent held by SAIC) and stochastic LSI (patents held by Recommind Inc) to address categories of text elements selected manually using Orb visualization.

4.1.1: It may be that no fully operational Intel system exists that using LSI tools in ways that are fully iterative with taxonomy and ontology development tools. 

4.1.2: The integration of LSI with Differential Ontology Framework will use the Ontology Lens (invented by Prueitt in 2002)

 

5: Ontology referential base (Orb) Creation

5.1: Takes phrases created by text processor and encodes them into Orb Encoded format

5.2: Conceptual Indexing created and visual interface provided

5.3: Text set is minimally indexed according to Orb results

5.4: Orb re-use is demonstrated where subject matter indicator neighborhoods from an Orb developed over a document collection is used to address, and retrieve from, a separate document collection for which no Orb has yet been produced

 

5:  Categorization

5.1:  Automated Thesaurus Based Categorization

5.2: A Thesaurus is used to provide conceptual linking among Orb center words which provides emerging categories

5.3: Prueitt Voting Procedure is used to create dynamic category policy

 

6: Human Refinement of Categories

6.1: Allows the user to further refine categories by performing a number of categorical operations on In-Memory Orb data encoding.  (This is the so-called “Rib” line arithmetic).

6.1.1: Delete

6.1.2: Merge

6.1.3: Move

6.1.4: Rename

6.2: Optional further conceptual indexing over specific categories, using polylogics and schemalogics

6.3: Allows for a transparent and immediate view of concepts within a category

6.4: Allows for a view of text files within a category as well as a percent match for that category versus other categories the text file is found within

6.5: Leads towards automated taxonomy generation using controlled vocabularies (as prototyped for the FCC)

 

7: Automated Update Engine

7.1: Automatically checks for new/updated text files within the defined remote or local text archive

7.2: Logs the new additions and updates to the text set

7.3: Re-creates the Orb and Conceptual Index based on new data

7.4: Re-Organizes and places new/updated documents into categories (continually updated taxonomy)

 

8: The automated update creates a real time presentation of thematic analysis

8.1: The update cycle can be as short as a few microseconds

8.1.1: The Referential information base (Rib) encoding provides very fast in-memory operations having a set membership query taking only n machine cycles where 2^n is the number of Rib points.  A Rib point is encoded in a key-less hash table with the ASCII value of the categoricalAbstraction (cA) atom encoded as the Rib point value.  A Rib point payload is exactly the same as a hash table “bucket”, and is used to express an encryption of data that expands into as large an information space as one might wish. 

8.1.2: The Rib encoding has fractal scalability, so the more data that is addressed the les resources are needed to encode new information.

8.2: The In-memory Orbs memory requirements are very small and can be directly subjected to merge and splitting operations (without reference to document collections).  The result is an Orb and any Orb can be transmitted as a simple ASCII text and then used in any circumstance as a back of the book indexical.  The Orbs can trend the evolution of the themes in a social discourse, at several levels of organization.  This is easy to show.