[36]                               home                           [38]

ORB Visualization

(soon)

 

 

 

Communicated from Paul Prueitt  12/22/2003 8:50 AM

 

 

Form based communication and computational reasoning

Towards a BCNGroup Communities Bead Game

Edited from the 1999 Paper

 

Editing completed : 12/22/2003 4:02 PM

 

 

The overview

Decision engines

Basic notation and simulation research

Testing and curricular design

Adaptive technology design for interactive curriculum

 

 


The overview

 

A many-to-one and one-to-many web based communication manager is required to facilitate the identification and movement of intelligence in networked communities.   The idea here is that the thematic structure of social discourse can be somehow extracted from the text in a web based discussion involving a number of people and then a visualization of that thematic structure made available as a retrieval mechanism.

 

Specific types of Knowledge Management (KM) technologies are required to provide background processes to assist in minimally structuring the activities of human participants.  But these KM technologies/methods need the underlying thematic analysis that is provided by the InOrb Technologies Inc ORB engine, and the visualization provided by the OntologyStream Inc SLIP browsers. 

 

Thematic structure is generated and visualized.  Members of the community then retrieve documents and locate paragraphs within those documents that are indicating the various themes.  The InOrb technology fascinating this is partially developed at www.InOrb.com.

 

Beginning in mid 2003, we began to use the language developed in the OntologyStream Inc Notational Paper:

 

Notational System for the Ontology Reference Base (ORB)

 

Specifically we want to start using the language on Subject Matter Indicator neighborhoods.  The SM-Indicator is, when considered in the abstract, a pattern of linguistic variation that is used in normal language to provide signs to those whom one wishes to communicate with.  Specific examples of linguistic co-occurrence are at www.dataRenewal.com and www.inOrb.com.

 

Natural language use is more flexible than formal language.  In correspondence to this difference, the exact form of the SM-Indicator varies, and is often incomplete while relying on the tacit knowledge of both the speaker and the hearer.  Again, the linguistic theory is difficult and abstract, but the results of the use of the SM-Indicator neighborhoods is to be measured by the satisfaction of the users.

 

In the most critical situations, those effecting life and death for example, the ontology reference have to allow a formative nature that is constrained by human control.   The work completed in 2002 on differential and formative ontology is one way to achieve this high degree of perceptual acuity using both the qualities of human perception and the capabilities of computer algorithms. 

 

A formal representation of linguistic variation in the ORB constructions is then also incomplete and must rely on human cognitive acuity to make final determinations on exactly what to be included in the ORB construction. 

Figure 1: The CM manages the flow of text between many users

and a single user (or a single process).

 

In the figure above we use a diagram developed in 1999 for what was then called a three-channel device (3CD).  This work was four years ago (as of Dec 2003).  But the idea is the same.  Many people should be able to communicate to the group as a whole and then some sort of synthesis of the social discourse occur.   The synthesis is what we want to stand up in the form of Ontology Referential Bases, with a visualization of the local topology of this ontology. 

 

In other words, we understand that the ORB is a high dimensional graph (what this means is something that needs to be discussed in a formal way).  The ORB’s structure, being a graph, can be viewed as a two dimensional graph (locally) and these local viewing of ORB structure is the same as the Subject Matter Indicator neighborhoods. 


 

Decision engines

 

Decision engines provide a key simulation feature to be used while the various species of communications managers are being developed and tested.  This is necessary as a means to test a robust BCNGroup Communities Bead Game.  However, the robust simulation work can be delayed while we integrate the SLIP, Instant Index and ORB technologies.

 

Before we move on, we should say that the Decision Engine design is quite simple. The engine will simulate the pairing of one element of a finite state machine, S, with one element of a finite state machine G.

 

S is often, but not always, interpreted as states of the world that requires response. G is often, but not always, the set of all possible gestured responses to states of the world.   So the simulation of interactions within the BCNGroup Communities Bead game can be seen as human-to-human, human to machine, machine to human, or machine-to-machine.

 

In some cases, S is a set of questions and G is a set of answers. In these cases, the paradigm is exceedingly simple and quite natural to the user, and may be used with polling instruments.  Richard Ballard’s Knowledge Foundations knowledge base system, the Mark 3, follows this notion.  A focus on questions and answers allow a reduction in the social discourse to something that is useful.  Of course, not all social discourse is composed of questions and answers.  But this fact should not allow us to ignore the value in question / answer paradigms. 

 


 

Basic notation and simulation research

 

In this section, a simple notation for the data structures in the communications manager is given.

 

The notation’s simplicity hides a great variety of supporting technologies, each of which may contribute to the core functionality of CMs.

 

The notation and related data structures provide a surface similarity between different types of knowledge technologies.  By knowledge technology we are referring to a technology like a typewriter that is used by humans and which are rather useless if no human is in the loop. 

 

Suppose we have a set of world states

 

S = { si | i = 1, . . .. , n }

 

and a set of gestures

 

G = { gj | j = 1, . . .. , m }

 

and a location

 

Lk e{ Lk | k = 1, . . .. , r }. = L

 

The decision engine is a simple simulation engine that randomly selects a world state, si , and assigns a gesture, gj ,, thus creating a pair, (si, gj), at each of a number of locations. At each of these locations, the pairs may be accumulated and then batch transmitted to a single service center mailbox.

 

Again, we are not interested in building the simulation in later December 2003, but rather to give an indication of how the BCNGroup Communities Bead Game software will work once the InOrb technologies are fully integrated (we hope in January 2004). The simulations will come later.

 

The simulation accounts for the following types of transactions:

 

Simple transmission:

 

1.     Decision pairing occurs at locations that are not connected to transmission devices. Decision pairs are then placed into a queue. When a transmission device is available, then the accumulated pairs are sent to the service center.

2.     Decisions may be grouped at the location into a series that has a beginning and an end. A transmission of a data record in the form ( j, ("start", "start")) and ( j, ("end", "end")) is made to insert start and end statements into a data base in the service center mailbox.

3.     The set of world states and the set of gestures are finite state machines, with perhaps different types of relationships between states in each machine. These relationships, between states, can be sent as metadata.

 

Complex transmission:

 

4.     In the most abstract form of the theory, posed here, the series is called a "passage". Transmission of multiple passages may be intermingled, as in human conversation. A communication manager that has proper annotation of context must manage this complex transmission. The manager must also have a computational substructure that records the common representation of invariance in the universe of discourse. We propose that this be done with ORBs.

5.     Each element in either finite state machine can be associated with a representational form. The representational forms are indications of the casual and logical features of states and are defined to be part of themespaces or concept spaces. These forms are used to encode statistics and reinforcement learning into an "implicit" memory or the past state experience, as suggested by Don Mitchell. The expression of this implicit memory is via voting procedures, and is thus very simple. Implicit memory is expressed via associations made in representation spaces.

6.     The transmission of a decision pairing may be either as a data packet or as transformed data, in analogy to the Fourier transform is discussed below.

 

The decision engine provides a randomized selection and transmission of decision forms. The transmission may be simple, in which can no complex processing occurs. If the transmission is complex, then the randomization is constrained by conditions placed on the relationships between a selection of elements of the two finite state machines, as well as on the representational methodology.

 

Using the InOrb technology, the data moving into a e-forum can be parsed and some results placed into an ORB.   The e-forum ORB can be visualized as a Upper Taxonomy.  Visualizations will allow a high fidelity retrieval of SM-Indictors. 

 

The notion of a complex transmission would provide "interpretive" steps between locations during transmission. Thus a type of "machine knowledge’ is possible wherein the complex transmission acts to transform a signal into a spectral domain (the ORB) and then perform an inverse transform from the spectral domain into the form of a simple transmission.

 

The identification of useful patterns requires two essential ingredients. First, the real world must have a generator that produces an actual pattern that is repeated. This pattern can then be seen using measurements on co-occurrence of tokens in bit streams. The second ingredient is specific knowledge of when the pattern begins and when it ends.

 

In simple cases, this is not an issue. For example the co-occurrence of terms in the distribution of word frequencies, or the co-occurrence of the range in which numerical data falls, is often within a context that easily establishes the beginning and end of the event. However, most naturally patterns are complex, incomplete and / or not properly measured.

 

During complex transmission, the CM provides a Fourier like spread of a signal into a specific decomposition involving the use of a substructural "vector’ basis. The vector basis, a mathematical notion from Fourier analysis, describes the nature of light by identifying energy wavelengths in the electromagnetic spectrum. The decomposition is analogous to wave transformations seen in quantum mechanics.

 

No one has done this, but the theory is firmly established.  We need only organizational support and some minimal funding (as of Dec 2003). 

 

In data stream decomposition of signal, the repeated patterns in the signal are the signals "spectrum". This signal spectrum can describe the content of the stream.  The ORBs do this whether the data is text or scientific data.  The iterated update of the ORB as new information is acquired is the task that Nathan Einwechter and Paul Prueitt and working on as of late December 2003.  The InOrb Technologies “product” is the system that will do this for any e-forum that follows certain formatting standards (so that the parsing and indexing is easy).

 

In the ORB theory, derived from Pribram’s work in cognitive neuroscience, signal spread is followed by signal processing in a "spectral domain" and then by the inverse transformation of the signal into a new bit stream. This re-localization is called, "a collapse of the wave" and is where any "interpretation" of information must occur.  In our context, this is where information from machine representation of ontology can be mixed with the data structure itself – providing both linguistic and ontology services to the formation of new small ontology structures that are highly situational in nature.  Again we reference the work on formative and differential ontology. 

 

"Knowledge" is regarded as only existing during this collapse, a collapse involved in the formation of the mental event experienced by a human. The CM follows this analogy in managing the complex transmission. The theory is grounded in neuropsychology and in the widely available experimental evidence regarding the processing of the flow of energy from the eye into brain regions.

Figure 2: Simple and Complex transmission of data streams

 

In simple transmission, no processing of the data stream is allowed. The data transmission is said to be Newtonian and simple. In complex transmission, a "sign system" is created that allows the "cross level" decomposition of the meaning of specific information in specific contexts and having specific pragmatics.   The cross level process is to be performed using the Prueitt Voting Procedure.  This voting procedure has been adequately studied in 1999 and before, but again is something that is not immediate on our development and deployment path (as of December 2003).

 

The sign system also provides structured annotation of context, and thus may shape the interpretation during the re-localization of information. The method for ambiguation/disambiguation is developed for this purpose.  If memory is available, in the form of a class of representations of substructural patterns, then the stratified many - to – many communication theory proposed by Prueitt is realized.

 

Traversal of an information gap, generically called epistemic gaps in the literature, require either a forward transformation or an inverse transformation of the signal. It is assumed here that interpretation must involve the traversal of an epistemic gap. Once a data stream is decomposed into semantic invariance, various computational argumentations can occur in a spectral domain built from theme and / or concept spaces. The semantic invariance may be statistically defined, as in the Dynamic Reasoning Engines (DREs) available from the company Autonomy Inc. The computational argumentation may be defined using quasi-axiomatic theory, Mill’s logic, and a class of procedures related to "voting procedures".

 

Computational argumentation, in the substrate, may change the relational linkage in ORB space leading to different event chemistry.  Recomposition of ORB structure based on these changes may use voting procedures to perform the inverse transform.  The new ORB structure will have well-established similarity and dissimilarity to the original data.   This is following the so-call Mill’s logic.  The details will vary but this is the essence of the idea. 

 

Ultimately, the natural objective of a knowledge extraction methodology is to produce a set of topics, perhaps organized into taxonomies. The set of topics is to be as complete as possible while respecting the content within areas that correspond to viewpoint. To respect the viewpoint, established by context, each area can to be treated separately. Reconciliation of terminology linkage difference, between various contexts and community viewpoints, can be managed with a technology like the SchemaServer from SchemaLogic Inc.  The measurement of consistency and completeness is made within areas and not across context.


 

Testing and curricular design

 

A specific methodology is introduced here and is related to both the BCNGroup Communities Bead Game software design specification and the

 

Knowledge Technology Toolkit for Kids CD.

 

In this section we provide communication theory when the states of the world are regarded as questions and gestures are regarded as answers.

 

Software specs (2-21-99)

 

Develop a table and screen to create a new universe of discourse, U. Each new U should have a unique identifier

 

The methodology involves four steps.

1.     Name and briefly describe the universe of discourse.

2.     Partition the universe in a small but complete number of areas of discourse.

3.     For each area, enumerate in a descriptive fashion a list of topics. The list should be complete and each element should be as independent as possible from any of the other topics in that area.

4.     For each topic, create one or more question / answer pairs.

 

During the four steps, particularly the last, it is possible to identify source material that provides prerequisite knowledge about each of the topics. Thus the methodology will produce testing material and curriculum. Curriculum can be properly defined through the enumerative procedure of the four steps.

 

The first step is for a knowledgeable person to specify a universe of discourse related to the target domain.

 

The second step is to "partition" the universe of discourse into areas that are as independent as possible. The expression of these areas follows the same path as the development of "axioms" in formal systems, such as geometry.

 

The third step considers, one at a time, each of the areas identified in the partition of U. Again the critical issue is the control of focus. We want the knowledgeable person to focus on developing a set of descriptive phrases that collectively could be used to describe any aspect of the area in focus.

 

The fourth, and last step is to take each of the phrases that where given in the previous step, one at a time and produce question / answer pairs.

 

Once all areas have been enumerated, then we need to allow the knowledgeable person to make changes. They will use a standard add / remove interface object to move over each of the phrases from a ‘tentative list" into a final list. This will require that the knowledgeable person review the list as a whole. While making this review, the knowledgeable person may choose to leave out a phrase or two. This can easily be done.

 

The purpose of the use-philosophy is to give the knowledgeable person a justification for building a complete set of topics. The use-philosophy justifies the fact that it will be harder to add new phrases to the tentative list, than to remove them.

 

Iterative refinement is expected. The user interface can again employ a Communications Manager (CM) to distribute the process of developing gestures during each of the four steps. The states of the CM are the phrases that were shown to the knowledgeable person.

 

Any of the steps may be managed in a collaborative fashion using the Internet browsers. For example, using InOrb technology, many knowledgeable persons can work together and the InOrb technology will create question/answer pairs for each of the descriptive phrases.

 

Most knowledgeable persons will have specific ways of organizing a discourse. We want to capture this organization as a "knowledge artifact". The InOrb knowledge management software, when built, can be used to capture and refine these "high level" partition elements of any universe of discourse.

 

An individual can fill out the topics of a universe of discourse, and then turn this work over into a Multiple User Environment (MUE).

 

After questions/answer pairs are completed the questions will be composed into tests and a different CM will use these test elements as states, sj, at a location, Li. The decision engine can then be used to simulate the answering of these questions (states) with answers (gestures). Automatic grading follows in a natural way.


 

Adaptive Technology Design For Interactive Curriculum

 

A "three channel-group device" allows the sharing of profile information between channel groups. Profiles generally come in three types; key word, semantic net or frames. The notation that follows assumes that the profile is a key word type profile.

 

Let us notate the collection C of curricular units at the level of individual lessons, lk , i.e.,

 

C = { lk | k ranges over an index set, 1, 2, . . . , r }

 

Lessons can be grouped together into units for testing purposes.

 

We wish to have a representation of the skill level of the learner. We propose that this level of skill can be approximated by an inventory of themes that are expressed in the lessons.

 

To obtain a computational handle on this inventory idea, we use the formalism of an ORB themespace.

 

Themespaces are part of a basic technology developed by academics and industry, and deployed in most Information Retrieval systems. They are high dimensional vector spaces defined using the set of theme words. For example, many of the web search engines use themespace technology.  The ORBs as a new and simple means to view the themspaces.

 

Defining the Themespace for a Curriculum:

 

For each lesson we take the text and send it to a word frequency parser or to a natural language parser. The parser output is processed to produce a set of key words or, in the case of natural language parsing, a set of theme phrases.

 

lk is an element of { t1 , t2 , . . . , t10 }

 

We use the symbol Tk to designate the set of themes for the kth lesson.

 

Tk = { t1 , t2 , . . . , t10 }

 

Generally the key words (themes) are ranked by numerical values. However, for the purpose of the theme profile of a lesson we will take only the top ten of the phrases and treat them equally.

 

The themespace that we need is the one that is defined from the set that contains all themes from all lessons. This set is written in the following way:

 

U =  Union of the Tk as k ranges over an index set, J = {1, 2, . . . , r }

 

The space so defined is called the universal themespace for the lessons. Now, note that each lesson defines a point in this universal themespace. In fact any subset of U defines a point in the universal space, so any union of lesson profiles defines a point in the space.

 

Perquisite order:

 

In most lesson plans, the lessons have requisites that should be mastered before starting the lesson. These requisites provide a partial order to the lessons that naturally place the lessons into a tree like structure called a lattice.

 

Thematic order:

 

One way of selecting the next lesson to study is to start at the root of the tree and move towards tree branch endings. However, it is often that case that user knowledge will be spotty and one would like to study lessons on a "as needed" basis. In this case we would like to follow the curriculum by managing the knowledge of which themes have been involved in priori learning or experience.

 

Thematic order is a complex subject that can be approached most simply using elementary set operations on the lesson profiles. We can define the profile of a user to be the union of the profiles of the lessons that the user has learned.

 

U = Union of the Tk as k ranges over an index set, I subset of J.

 

Determining which lessons have been learned is done via standard tests.

 

Nodal Forest Learning Strategy

 

The Nodal Forest Learning Strategy was developed and tested over a period of about ten years while Prueitt was teaching university mathematics courses. It is based on an itemization of the topics in a curriculum, and the use of published principles from theoretical immunology and associative neural networks. The itemization is used to produce a themespace. The Strategy has a simple implementation that using the voting procedures, also developed by Prueitt, to produce category and placement policies defined by the elements from the themespace.

 

The consequences of using the Nodal Forest Learning s Strategy is an adaptive presentation of new materials to a learner, based on thematic content.

 

Thematic selection of next lesson:

 

The user profile defines a topological neighborhood, which can be matched to the Subject Matter Indicator neighborhoods in an ORB.  Each lesson has neighborhoods related to it. Due to the metric in the space, the lessons that the user has learned will be close to the user profile.

 

This means that, in general, lessons learned will be in a small neighborhood of the user profile, but as one gradually increases the radius of the neighborhood one finds the closest lesson not learned.   This provides a order of presentation to the material which is adaptive to the learning experience of the learner.

 

The same class of procedures can be applied to obtaining documents from the repository of learning materials that are indexed by the elements of the ORBs.