In preparation for distribution and demonstration

March 4, 1999

 

Form based communication and computational reasoning

Paul S Prueitt

 

Section 1: Software components

Section 2: Basic notation and simulation research

Section 3: Transmission, processing and interpretation of signal

Section 4: Test and curricular design.

Section 5: Adaptive Technology Design For Interactive Curriculum

 

 

 

 

 

 

 

 

Section 1: Software components

In this design, there are four types of software components:

  1. Multiple channel device
  2. Communications manager
  3. Decision engine
  4. Knowledge artifact design tool

1.1: The multi-channel devices are designed around a separation of synchronous from asynchronous communication and the separate management of information services. We find that the separation is grounded in a philosophical and scientific understanding of the human mind body problem and on current generation Information Technology (IT).

Each of these three functions have been associated with a group of communication channels. In the Multi-Channel Device (MCD) software code, each group of communication channels have many virtual, but dedicated, information channels that connect the contents of Web Browser frames to other locations in the web.

Dedicated channels are separated using MCD port addresses that are independent of IP addresses and ports.

Figure 1.1: High level schematic of the MCD.

Several prototype versions of the MCD are available from links at

www.bcngroup.org/admin/links.htm,

where the prototypes are in use in the mediation of several projects.

A number of technologies are accommodated by the existence of specific kinds of data structures involved in channeled transmission of information and in knowledge extraction. These technologies are being co-developed with a suite of data structures that standardize the interaction format between MCDs.

Specifically, a Three Channel Device (3CD) (Figure) arranges context in a specific fashion that enhances collaborative communication and knowledge capture. The content is captured as annotation by preserving the state of the device when user events, such as text transmission, occurs.

 

Figure 1.2: The prototype 3CD as of 2/15/99.

In the Figure, we see channel group one in the lower right part of the browser. Above channel group one is channel group two. Channel group three occupies the left side of the browser.

1.2: A many-to-one and one-to-many web based communication manager is required to facilitate the identification and movement of intelligence in networked communities.

We have researched the scholarly literature on knowledge management. From this literature we have identified the notions of "intellectual property mining" and "corporate virtual intelligence". These notions provide underlying objectives for software design and development in my lab.

Using the MCDs we can specify and organize a first approximation to mature knowledge artifacts representing intelligence about some situation or set of situations. Then a collaborative process is supported where many individuals may examine the knowledge and make comments. The context of these comments are managed using the browser based MCDs and a web transmission based Communications Manager (CM).

The knowledge management technologies are required to provide background processes to complex data transmission as well as to assist in minimally structuring the activities of human participants.

Figure 1.3: The CM manages the flow of text between many users and a single user.

As an example of how a CM might work, we may examine a prototype that was developed by my staff around the collective knowledge in a "universe of discourse" regarding the "Generalized Phone System".

Figure 1.4 virtual intelligence mining with the multiple channel device

In this prototype we simulated the virtual discussion about what a phone system is. One of our developers developed a set of seven diagrams that delineated the universe of discourse in a rough but fair fashion. Then he developed short text contents for a function, purpose and remarks text channel. A number of domain experts where then asked to visit the web site and make comments into the input box (below the output box in Figure 1.4). As the responses are made, the state of the browser is recorded into a database. The results was a refinement of the diagrams and the text contents of the three aspects function, purpose and remarks.

The class of all CMs share common characteristics that will be developed over time. In some, but not all, cases a CM technology implementation will have a human in the loop. In our experiment, the author handled the routing and the summarization of text leading to a revision of the diagrams and text.

1.3: Decision engines provide a key simulation feature to be used while the various species of communications managers are being developed and tested.

The Decision Engine (DE) is quite simple. It simulates the pairing of one element of a finite state machine, S, with one element of a finite state machine G. S is often, but not always, interpreted as states of the world that requires response. G is often, but not always, the set of all possible gestured responses to states of the world.

Several abstract formalisms were used as a specific model for this pairing. The NDE contains features that are an abstraction of features that are seen in a number of academic disciplines. The NDE also has a rich mathematical and logical grounding.

In some cases, S is a set of questions and G is a set of answers. In these cases, the paradigm is exceedingly simple and quite natural to the user, and may be used with polling instruments.

1.4: Knowledge artifact design tools are needed to start, and refine, a collaborative discussion about some specified "universe of discourse". The current single Artifact Design Tool (ADT) is a FoxPro 2.6 suite of tools that uses a number of other commercial systems to identify the areas of discussion for a specific project (example: The Generalized Phone System).

A methodology for completely specifying the discourse in the form of a large (or small) number of topics, has been developed. It is the core methodology for building mature knowledge artifacts through managed collaborative discussion using a MCD.

Software supporting this core technology has been designed but not yet built or tested. The methodology is developed, in the context of collaborative distance learning, in Section 4.

Section 2: Basic notation and simulation research

In this section, the notation for the data structures in the communications manager is given. The notation’s simplicity hides a great variety of supporting technologies, each of which may contribute to the core functionality of CMs. However, it is important to note that MCDs can operate with no complex processing, as well as with the complex processing. A distinction is made between these two types of processing.

However, standards in the surface notation and related data structures provide a surface functionality.

Suppose we have a set of world states

S = { si | i = 1, . . .. , n }

and a set of gestures

G = { gj | j = 1, . . .. , m }

and a location

Lk e{ Lk | k = 1, . . .. , r }. = L

The decision engine is a simple simulation engine that randomly selects a world state, si , and assigns a gesture, gj ,, thus creating a pair, (si, gj), at each of a number of locations. At each of these locations, the pairs may be accumulated and then batch transmitted to a single service center mailbox (see Figure 3.1).

The simulation accounts for the following types of transactions:

Simple transmission:

  1. Decision pairing occurs at locations which are not connected to transmission devices. Decision pairs are then placed into a queue. When a transmission device is available, then the accumulated pairs are sent to the service center.
  2. Decisions may be grouped at the location into a series that has a beginning and an end. A transmission of a data record in the form ( j, ("start", "start")) and ( j, ("end", "end")) is made to insert start and end statements into a data base in the service center mailbox.
  3. The set of world states and the set of gestures are finite state machines, with perhaps different types of relationships between states in each machine. These relationships, between states, can be sent as metadata.

Complex transmission:

  1. In the most abstract form of the theory, posed here, the series is called a "passage". Transmission of multiple passages may be intermingled, as in human conversation. A communication manager that has proper annotation of context must manage this complex transmission. The manager must also have a computational substructure that records the common representation of invariance in the universe of discourse.
  2. Each element in either finite state machine can be associated with a representational form. The representational forms are indications of the casual and logical features of states and are defined to be part of themespaces or concept spaces. These forms are used to encode statistics and reinforcement learning into an "implicit" memory. The expression of this implicit memory is via voting procedures, and is thus very simple. Implicit memory is expressed via associations made in representation spaces.
  3. The transmission of a decision pairing may be either as a data packet or as transformed data, in analogy to the Fourier transform to the discussed below.

The decision engine provides a randomized selection and transmission of decision forms. The transmission may be simple, in which can no complex processing occurs. If the transmission is complex, then the randomization is constrained by conditions placed on the relationships between a selection of elements of the two finite state machines, as well as on the representational methodology.

Evolutionary programming can be employed here. Moreover, the theory being developed in quantum computing serves as a guide to use-philosophy. Whereas the theory is very esoteric; the system, when completed, will hid the complexity and show only a new ability, of the web browser, to dialog about things, rather than merely retrieve information. Such dialog requires an interpretation of information, and this interpretation can be provided computational.

 

Section 3: Transmission, processing and interpretation of signal

3.1: Web browsers currently manage only simple data transmission.

Using a data-mining paradigm, the data moving into and out of a browser can be parsed and some results placed into a database back end to the browser. Information Technology professional services provide this capability..

In the case of a simple transmission of data, the patterns of data can be identified using various methods. In our definition of the terms "data", "information", "knowledge", and "wisdom"; we distinguish each within a hierarchy.

Data is data

Information is the organization of data

Knowledge is the interpretation of information

Wisdom is the "correct" use of knowledge.

The transmission of data and information can both be simple transmission, i.e., what one location receives is what the other location sends.

However, the transmission of "human knowledge", is, by definition, not possible. It is; however, observed that notational systems such as the periodic table of chemical elements may be transmitted as a simple transmission.

Knowledge artifacts such as natural language may be transmitted between locations. Complex transmission would provide "interpretive" steps between locations during transmission. Thus a type of "machine knowledge’ is possible wherein the complex transmission acts to transform a signal into a spectral domain (themespace) and then perform an inverse transform from the spectral domain into the form of a simple transmission.

If the analogy it very strong to the actual mechanisms involved in brain and perception, then we have the right to call this "machine knowledge".

3.2: MCDs manage complex data transmission.

The identification of useful patterns requires two essential ingredients. First, the real world must have a generator that produces an actual pattern that is repeated. This pattern can then be seen, sometimes, using measurements on co-occurrence of tokens in bit streams. The second ingredient is specific knowledge of when the pattern begins and when it ends.

In simple cases, this is not an issue. For example the co-occurrence of terms in the distribution of word frequencies, or the co-occurrence of the range in which numerical data falls, is often within a context that easily establishes the beginning and end of the event. However, most naturally patterns are complex, incomplete and / or not properly measured.

During complex transmission, the CM provides a Fourier like spread of a signal into a specific decomposition involving the use of a substructural "vector’ basis. The vector basis, a mathematical notion from Fourier analysis, describes the nature of light by identifying energy wavelengths in the electromagnetic spectrum. The decomposition is also analogous to a bit stream to wave transformation seen in quantum mechanics. In data stream decomposition of signal, the repeated patterns in the signal is the signals "spectrum". This signal spectrum can describe the content of the stream.

The spread is followed by signal processing in a "spectral domain" and then by the inverse transformation of the signal into a new bit stream. This re-localization is called, "a collapse of the wave" and is where any "interpretation" of information must occur. "Knowledge" is regarded as only existing during this collapse. The CM follows this analogy in managing the complex transmission. The theory is grounded in neuropsychology and in the widely available experimental evidence regarding the processing of the flow of energy from the eye into brain regions.

Figure 3.1: Simple and Complex transmission of data streams

In simple transmission, no processing of the data stream is allowed. The data transmission is said to be Newtonian and simple. In complex transmission, a "sign system" is created that allows the "cross level" decomposition of the meaning of specific information in specific contexts and having specific pragmatics. The sign system also provides structured annotation of context, and thus may shape the interpretation during the re-localization of information. If memory is available, in the form of a class of representations of substructural patterns, then the stratified communication theory proposed by Prueitt is realized.

Traversal of an information gap, generically called epistemic gaps in the literature, require either a forward transformation or an inverse transformation of the signal. It is assumed here that interpretation must involve the traversal of an epistemic gap. Once a data stream is decomposed into semantic invariance, various computational argumentations can occur in a spectral domain built from theme and / or concept spaces. The semantic invariance may be statistically defined, as in the Dynamic Reasoning Engines (DREs) available form the company Autonomy Inc. The computational argumentation may be defined using quasi-axiomatic theory, Mill’s logic, and a class of procedures called "voting procedures".

The computational argumentation, in the substrate, changes the position of tokens in the theme or concept space. Recomposition uses voting procedures to perform the inverse transform and to produce a new data packet with well-established similarity and dissimilarity to the original data.

Ultimately, the natural objective of a knowledge extraction methodology is to produce a set of topics, perhaps organized into taxonomies. This set of topics is to be as complete as possible while respecting the content within areas that correspond to viewpoint. To respect the viewpoint, established by context, each area is to be treated separately. Thus, the measurement of consistency and completeness is made within areas and not across context.

Section 4: Test and curricular design.

A specific methodology is introduced here. This methodology has a grounding in the previous published work of Prueitt in logic, learning theory and knowledge representation.

In this section we provide an interpretation of Prueitt’s communication theory when the states of the world are regarded as questions and gestures are regarded as answers.

4.1: Software specs (2-21-99)

Develop a table and screen to create a new universe of discourse, U. Each new U should have a unique identifier (for example, use the FoxPro key generator), a name, and description. A user id should identify who created the universe.

Now we need a set of screens and a set of states, S = { sj }, where the states are prompts that will be used by a "Communications Manager" (CM) to keep the knowledgeable person focused on each step in a multi-step process.

Two basic screens, an O/I, I/O device and a Checking device screen, will be re-used within each of the steps in the multi-step process. At each step, the basic screens will be modified slightly to indicate clearly the context of that step.

The basic screen for managing Input/Output and Output/Input processes is seen in the O/I, I/O figure.

Figure 4.1: O/I, I/O device

The CM paradigm is used to associate state-gesture pairs, (sj , gi ), and thus the O/I, I/O device fits into the common framework that is required by CMs.

In particular, the O/I, I/O device can be used to extract knowledge artifacts from a distributed or virtual environment. This extraction process requires the persistent availability of synchronous and asynchronous channels, a new memory and context support technology, and a CM that has adaptive pattern recognition technology.

The decision engines allow the simulation of knowledge artifact extraction in a Multiple User Environment. These Multiple User Environments are called MUEs.

In the current design phase, the decision engines simulate the judgment process through which a human makes a response, as an input gesture, to a statement that is displayed as an element of the finite state space, S. These simulations are discussed in a study of human decision making.

The basic screen for managing Checking processes is seen in the Checking device figure.

Figure 4.2: Checking device

Once the CM obtains a set of responses, the CM allows the user(s) an opportunity to review the list of responses. The review requires a movement of items from the left list box to the right list box.

Some items can be left behind, thus reducing the complexity of the list. Adding a new item to the left list requires a user to return to the O/I, I/O device. These two design elements are directed at producing a minimal set of descriptors. A "use-philosophy" is developed as a consequence of these design elements.

4.2: Methodology

Our multi step process follows a methodology defined by a research group working in one of the National labs. The base methodology, called "Ultrastructure Methodology", is related to both Knowledge Management techniques and the Nodal Forrest Learning strategy developed by Prueitt. This methodology has been recently extended to support the creation and storage of knowledge extraction in MUEs using collective virtual intelligence mining techniques.

The methodology involves four steps.

    1. Name and briefly describe the universe of discourse.
    2. Partition the universe in a small but complete number of areas of discourse.
    3. For each area, enumerate in a descriptive fashion a list of topics. The list should be complete and each element should be as independent as possible from any of the other topics in that area.
    4. For each topic, create one or more question / answer pairs.

During the four steps, particularly the last, it is possible to identify source material that provides prerequisite knowledge about each of the topics. Thus the methodology will produce testing material and curriculum. Curriculum can be properly defined through the enumerative procedure of the four steps.

The first step is for a knowledgeable person to specify a universe of discourse related to the target domain. Stating a name and a short description will do this. (See similar naming and description in creating a new conference on the O’Reilly WebBoard.)

The second step is to "partition" the universe of discourse into areas that are as independent as possible. The expression of these areas follows the same path as the development of "axioms" in formal systems, such as geometry.

The third step considers, one at a time, each of the areas identified in the partition of U. Again the critical issue is the control of focus. We want the knowledgeable person to focus on developing a set of descriptive phrases that collectively could be used to describe any aspect of the area in focus.

The fourth, and last step is to take each of the phrases that where given in the previous step, one at a time and produce question / answer pairs.

Once all areas have been enumerated, then we need to allow the knowledgeable person to make changes. They will use a standard add / remove interface object to move over each of the phrases from a ‘tentative list" into a final list. This will require that the knowledgeable person review the list as a whole. While making this review, the knowledgeable person may choose to leave out a phrase or two. This can easily be done.

The purpose of the use-philosophy is to give the knowledgeable person a justification for building a complete set of topics. The use-philosophy justifies the fact that it will be harder to add new phrases to the tentative list, than to remove them.

Iterative refinement is expected. The user interface can again employ a Communications Manager (CM) to distribute the process of developing gestures during each of the four steps. The states of the CM are the phrases that were shown to the knowledgeable person.

Any of the steps may be managed in a collaborative fashion using the Internet browsers. For example, using MUEs, many knowledgeable persons can work together to create one or more question/answer pairs for each of the descriptive phrases.

4.3: Extraction of Knowledge Artifacts in MUEs.

Most knowledgeable persons will have specific ways of organizing the discourse. We want to capture this organization as a "knowledge artifact". Our proposed MUE software, when built, can be used to capture and refine these "high level" partition elements of any universe of discourse.

An individual can fill out the topics of a universe of discourse, and then turn this work over into a Multiple User Environment (MUE).

The work by Kang Xu on the Generalized Phone system is an example of this knowledge extraction. We are planning to conduct a refinement of this artifact as soon as the web version of the Interactive learning Center is completed.

4.4: O/I, I/O dialog

Because of "controlled randomness", the software should appear to "dialog" with the knowledgeable person. The dialog is managed by the CM’s decision engine, and can be simulated by the Decision Engine.

The set of states in the CM’s decision engine is the set of prompts that are used to ask the user to list a minimal set of descriptive phrases. These prompts can be developed in a generic fashion to support the "drawing" of gestures from a user. This minimal descriptive enumeration draws from knowledgeable persons the knowledge that knowledgeable persons have about the area.

The software will select states, in the form of prompts, and ask the user to supply gestures, in the form of human language sentences. For example, in step 3, the sentences should be focused on one topic and one view of that topic. Some of the sentences will become questions, and related answers, that will be stored in a question bank for later use in multiple choice tests.

Using the engine, experiments can be completed that determine how systems will scale in size and how performance can be monitored.

4.5: Creating test banks

After questions/answer pairs are completed the questions will be composed into tests and a different CM will use these test elements as states, sj, at a location, Li. The decision engine can then be used to simulate the answering of these questions (states) with answers (gestures). Automatic grading follows in a natural way.

Section 5: Adaptive Technology Design For Interactive Curriculum

A "three channel-group device" allows the sharing of profile information between channel groups. Profiles generally come in three types; key word, semantic net or frames. The notation that follows assumes that the profile is a key word type profile.

Let us notate the collection C of curricular units at the level of individual lessons, lk , i.e.,

C = { lk | k ranges over an index set, 1, 2, . . . , r }

Lessons can be grouped together into units for testing purposes.

We wish to have a representation of the skill level of the learner. We propose that this level of skill can be approximated by an inventory of themes that are expressed in the lessons.

To obtain a computational handle on this inventory idea, we use the formalism of a themespace.

Themespaces are part of a basic technology developed by academics and industry, and deployed in most Information Retrieval systems. They are high dimensional vector spaces defined using the set of theme words. For example, many of the web search engines use themespace technology.

5.1: Defining the Themespace for a Curriculum:

For each lesson we take the text and send it to a word frequency parser or to a natural language parser. The parser output is processed to produce a set of key words or, in the case of natural language parsing, a set of theme phrases.

lk à { t1 , t2 , . . . , t10 }

We use the symbol Tk to designate the set of themes for the kth lesson.

Tk = { t1 , t2 , . . . , t10 }

Generally the key words (themes) are ranked by numerical values. However, for the purpose of the theme profile of a lesson we will take only the top ten of the phrases and treat them equally. The theme profile for one lesson defines a 10 dimensional themespace, each phrase defining exactly one new dimension.

The themespace that we need is the one that is defined from the set that contains all themes from all lessons. This set is written in the following way:

U = È Tk as k ranges over an index set, J = {1, 2, . . . , r }

The space so defined is called the universal themespace for the lessons. Now, note that each lesson defines a point in this universal themespace. In fact any subset of U defines a point in the universal space, so any union of lesson profiles defines a point in the space.

5.2: Perquisite order:

In most lesson plans, the lessons have requisites that should be mastered before starting the lesson. These requisites provide a partial order to the lessons that naturally place the lessons into a tree like structure called a lattice.

5.3: Thematic order:

One way of selecting the next lesson to study is to start at the root of the tree and move towards tree branch endings. However, it is often that case that user knowledge will be spotty and one would like to study lessons on a "as needed" basis. In this case we would like to follow the curriculum by managing the knowledge of which themes have been involved in priori learning or experience.

Thematic order is a complex subject that can be approached most simply using elementary set operations on the lesson profiles. We can define the profile of a user to be the union of the profiles of the lessons that the user has learned.

U = È Tk as k ranges over an index set, I Ì J.

Determining which lessons have been learned is done via standard tests.

 

5.4: Nodal Forest Learning Strategy

The Nodal Forest Learning Strategy was developed and tested over a period of about ten years while Prueitt was teaching university mathematics courses. It is based on an itemization of the topics in a curriculum, and the use of published principles from theoretical immunology and associative neural networks. The itemization is used to produce a themespace. The Strategy has a simple implementation that using the voting procedures, also developed by Prueitt, to produce category and placement policies defined by the elements from the themespace.

The consequences of using the Nodal Forest Learning s Strategy is an adaptive presentation of new materials to a learner, based on thematic content.

5.5: Thematic selection of next lesson:

The user profile defines a point in the high dimensional universal themespace. Each of the lessons also defines a point in this same space. Due to the metric in the space, the lessons that the user has learned will be close to the user profile. This means that, in general, lessons learned will be in a small neighborhood of the user profile, but as one gradually increases the radius of the neighborhood one finds the closest lesson not learned.

Now this may seem a little odd, but the metric of themespaces can be changed fairly easily. Neural network clustering or thematic relational logic can modify the procedure, for selection of the closest-lesson-not-learned.

The same class of procedures can be applied to obtaining documents from the repository of learning materials that are indexed by a CM.

5.6: Selection of relevant archived material:

The user profile can autonomously select a ranked list of materials that is user specific. The learning profile itself can be used to provide a focus to instruction. However, a "retrieval" profile can also be easily developed based on the themespace procedures that we will deliver.

Several profiles for each user can be stored locally and then used in different circumstances. This introduces the "ring of modes" to capture different the learning modes of a single individual. This will be demonstrated.

5.7: Learner loyalty:

The adaptive selection of learning material and curriculum employs a new paradigm that is being developed in electronic commerce. The paradigm is based on the notion that the customer should feel that the "system" treats him or her as an individual. By treating the customer as an individual, the customer develops loyalty to the system.

5.8: Innovation:

Adaptive technologies are new. We have just begun to explore simple ways in which profile representation can be adapted by a user’s behavior. The new technology is interesting to individuals because it is the individual’s own actions that shape the profiles. The shaping process is indirect and thus is often surprising. As long as true learning is occurring, then these surprises can only aid in the overall process.