In preparation for distribution and
demonstration
March 4, 1999
Form based communication and
computational reasoning
Paul S
Prueitt
Section 1: Software components
Section 2: Basic notation and simulation research
Section 3: Transmission, processing and interpretation of signal
Section
4: Test and curricular design.
Section 5: Adaptive Technology Design For Interactive Curriculum
In this design, there are four types of
software components:
1.1: The multi-channel devices
are designed around a separation of synchronous from asynchronous communication
and the separate management of information services. We find that the
separation is grounded in a philosophical and scientific understanding of the
human mind body problem and on current generation Information Technology (IT).
Each of these three functions have been
associated with a group of communication channels. In the Multi-Channel Device
(MCD) software code, each group of communication channels have many
virtual, but dedicated, information channels that connect the contents of Web
Browser frames to other locations in the web.
Dedicated channels are separated using MCD
port addresses that are independent of IP addresses and ports.
Figure
1.1: High level schematic of the MCD.
Several prototype versions of the MCD are
available from links at
www.bcngroup.org/admin/links.htm,
where the prototypes are in use in the mediation
of several projects.
A number of technologies are accommodated by
the existence of specific kinds of data structures involved in channeled
transmission of information and in knowledge extraction. These technologies are
being co-developed with a suite of data structures that standardize the
interaction format between MCDs.
Specifically, a Three Channel Device (3CD)
(Figure) arranges context in a specific fashion that enhances collaborative communication
and knowledge capture. The content is captured as annotation by preserving the
state of the device when user events, such as text transmission, occurs.
Figure
1.2: The prototype 3CD as of 2/15/99.
In the Figure, we see channel group one in
the lower right part of the browser. Above channel group one is channel group
two. Channel group three occupies the left side of the browser.
1.2: A many-to-one and one-to-many web based communication
manager is required to facilitate the identification and movement of
intelligence in networked communities.
We have researched the scholarly literature
on knowledge management. From this literature we have identified the notions of
"intellectual property mining" and "corporate virtual
intelligence". These notions provide underlying objectives for software
design and development in my lab.
Using the MCDs we can specify and organize a
first approximation to mature knowledge artifacts representing intelligence
about some situation or set of situations. Then a collaborative process is
supported where many individuals may examine the knowledge and make comments.
The context of these comments are managed using the browser based MCDs and a
web transmission based Communications Manager (CM).
The knowledge management technologies are
required to provide background processes to complex data transmission as well
as to assist in minimally structuring the activities of human participants.
Figure
1.3: The CM manages the flow of text
between many users and a single user.
As an example of how a CM might work, we may
examine a prototype that was developed by my staff around the collective
knowledge in a "universe of discourse" regarding the
"Generalized Phone System".
Figure
1.4 virtual intelligence mining with
the multiple channel device
In this prototype we simulated the virtual
discussion about what a phone system is. One of our developers developed a set
of seven diagrams that delineated the universe of discourse in a rough but fair
fashion. Then he developed short text contents for a function, purpose and
remarks text channel. A number of domain experts where then asked to visit the
web site and make comments into the input box (below the output box in Figure
1.4). As the responses are made, the state of the browser is recorded into a
database. The results was a refinement of the diagrams and the text contents of
the three aspects function, purpose and remarks.
The class of all CMs share common
characteristics that will be developed over time. In some, but not all, cases a
CM technology implementation will have a human in the loop. In our experiment,
the author handled the routing and the summarization of text leading to a
revision of the diagrams and text.
1.3: Decision engines provide a
key simulation feature to be used while the various species of communications
managers are being developed and tested.
The Decision Engine (DE) is quite simple. It
simulates the pairing of one element of a finite state machine, S, with
one element of a finite state machine G. S is often, but not
always, interpreted as states of the world that requires response. G is
often, but not always, the set of all possible gestured responses to states of
the world.
Several abstract formalisms were used as a
specific model for this pairing. The NDE contains features that are an
abstraction of features that are seen in a number of academic disciplines. The
NDE also has a rich mathematical and logical grounding.
In some cases, S is a set of questions
and G is a set of answers. In these cases, the paradigm is exceedingly
simple and quite natural to the user, and may be used with polling instruments.
1.4: Knowledge artifact design tools
are needed to start, and refine, a collaborative discussion about some
specified "universe of discourse". The current single Artifact Design
Tool (ADT) is a FoxPro 2.6 suite of tools that uses a number of other
commercial systems to identify the areas of discussion for a specific project
(example: The Generalized Phone System).
A methodology for completely specifying the
discourse in the form of a large (or small) number of topics, has been
developed. It is the core methodology for building mature knowledge artifacts
through managed collaborative discussion using a MCD.
Software supporting this core technology has
been designed but not yet built or tested. The methodology is developed, in the
context of collaborative distance learning, in Section 4.
In this section, the notation for the data
structures in the communications manager is given. The notation’s simplicity
hides a great variety of supporting technologies, each of which may contribute
to the core functionality of CMs. However, it is important to note that MCDs
can operate with no complex processing, as well as with the complex processing.
A distinction is made between these two types of processing.
However, standards in the surface notation
and related data structures provide a surface functionality.
Suppose we have a set of world states
S = { si | i = 1, . . .. , n }
and a set of gestures
G = { gj | j = 1, . . .. , m }
and a location
Lk
e{ Lk | k = 1, . . .. , r }. = L
The decision engine is a simple simulation
engine that randomly selects a world state, si , and assigns a
gesture, gj ,, thus creating a pair, (si, gj),
at each of a number of locations. At each of these locations, the pairs may be
accumulated and then batch transmitted to a single service center mailbox (see
Figure 3.1).
The simulation accounts for the following
types of transactions:
Simple transmission:
Complex
transmission:
The decision engine provides a randomized
selection and transmission of decision forms. The transmission may be simple,
in which can no complex processing occurs. If the transmission is complex, then
the randomization is constrained by conditions placed on the relationships
between a selection of elements of the two finite state machines, as well as on
the representational methodology.
Evolutionary programming can be employed
here. Moreover, the theory being developed in quantum computing serves as a
guide to use-philosophy. Whereas the theory is very esoteric; the system, when
completed, will hid the complexity and show only a new ability, of the web
browser, to dialog about things, rather than merely retrieve information. Such
dialog requires an interpretation of information, and this interpretation can
be provided computational.
3.1: Web browsers currently manage only
simple data transmission.
Using a data-mining paradigm, the data moving
into and out of a browser can be parsed and some results placed into a database
back end to the browser. Information Technology professional services provide
this capability..
In the case of a simple transmission of data,
the patterns of data can be identified using various methods. In our definition
of the terms "data", "information", "knowledge",
and "wisdom"; we distinguish each within a hierarchy.
Data is data
Information is the
organization of data
Knowledge is the
interpretation of information
Wisdom is the
"correct" use of knowledge.
The transmission of data and information can
both be simple transmission, i.e., what one location receives is what the other
location sends.
However, the transmission of "human
knowledge", is, by definition, not possible. It is; however, observed that
notational systems such as the periodic table of chemical elements may be
transmitted as a simple transmission.
Knowledge artifacts such as natural language
may be transmitted between locations. Complex transmission would provide
"interpretive" steps between locations during transmission.
Thus a type of "machine knowledge’ is possible wherein the complex
transmission acts to transform a signal into a spectral domain (themespace) and
then perform an inverse transform from the spectral domain into the form of a
simple transmission.
If the analogy it very strong to the actual
mechanisms involved in brain and perception, then we have the right to call
this "machine knowledge".
3.2: MCDs manage complex data transmission.
The identification of useful patterns
requires two essential ingredients. First, the real world must have a generator
that produces an actual pattern that is repeated. This pattern can then be
seen, sometimes, using measurements on co-occurrence of tokens in bit streams.
The second ingredient is specific knowledge of when the pattern begins and when
it ends.
In simple cases, this is not an issue. For
example the co-occurrence of terms in the distribution of word frequencies, or
the co-occurrence of the range in which numerical data falls, is often within a
context that easily establishes the beginning and end of the event. However,
most naturally patterns are complex, incomplete and / or not properly measured.
During complex transmission, the CM provides
a Fourier like spread of a signal into a specific decomposition involving the
use of a substructural "vector’ basis. The vector basis, a mathematical
notion from Fourier analysis, describes the nature of light by identifying
energy wavelengths in the electromagnetic spectrum. The decomposition is also
analogous to a bit stream to wave transformation seen in quantum mechanics. In
data stream decomposition of signal, the repeated patterns in the signal is the
signals "spectrum". This signal spectrum can describe the content of
the stream.
The spread is followed by signal processing
in a "spectral domain" and then by the inverse transformation of the
signal into a new bit stream. This re-localization is called, "a collapse
of the wave" and is where any "interpretation" of information
must occur. "Knowledge" is regarded as only existing during this
collapse. The CM follows this analogy in managing the complex transmission. The
theory is grounded in neuropsychology and in the widely available experimental
evidence regarding the processing of the flow of energy from the eye into brain
regions.
Figure
3.1: Simple and Complex transmission
of data streams
In simple transmission, no processing of the
data stream is allowed. The data transmission is said to be Newtonian and
simple. In complex transmission, a "sign system" is created that
allows the "cross level" decomposition of the meaning of specific
information in specific contexts and having specific pragmatics. The sign
system also provides structured annotation of context, and thus may shape the
interpretation during the re-localization of information. If memory is
available, in the form of a class of representations of substructural patterns,
then the stratified communication theory proposed by Prueitt is realized.
Traversal of an information gap, generically
called epistemic gaps in the literature, require either a forward
transformation or an inverse transformation of the signal. It is assumed here
that interpretation must involve the traversal of an epistemic gap. Once a data
stream is decomposed into semantic invariance, various computational
argumentations can occur in a spectral domain built from theme and / or concept
spaces. The semantic invariance may be statistically defined, as in the Dynamic
Reasoning Engines (DREs) available form the company Autonomy Inc. The
computational argumentation may be defined using quasi-axiomatic theory, Mill’s
logic, and a class of procedures called "voting procedures".
The computational argumentation, in the
substrate, changes the position of tokens in the theme or concept space.
Recomposition uses voting procedures to perform the inverse transform and to
produce a new data packet with well-established similarity and dissimilarity to
the original data.
Ultimately, the natural objective of a
knowledge extraction methodology is to produce a set of topics, perhaps
organized into taxonomies. This set of topics is to be as complete as possible
while respecting the content within areas that correspond to viewpoint. To
respect the viewpoint, established by context, each area is to be treated
separately. Thus, the measurement of consistency and completeness is made
within areas and not across context.
A specific methodology is introduced here.
This methodology has a grounding in the previous published work of Prueitt in
logic, learning theory and knowledge representation.
In this section we provide an interpretation
of Prueitt’s communication theory when the states of the world are regarded as
questions and gestures are regarded as answers.
4.1: Software specs (2-21-99)
Develop a table and
screen to create a new universe of discourse, U. Each new U
should have a unique identifier (for example, use the FoxPro key generator), a
name, and description. A user id should identify who created the universe.
Now we need a set
of screens and a set of states, S = { sj }, where the states
are prompts that will be used by a "Communications Manager" (CM) to
keep the knowledgeable person focused on each step in a multi-step process.
Two basic screens,
an O/I, I/O device and a Checking device screen, will be re-used within each of
the steps in the multi-step process. At each step, the basic screens will be
modified slightly to indicate clearly the context of that step.
The basic
screen for managing Input/Output and Output/Input processes is seen in the O/I, I/O figure.
Figure 4.1: O/I,
I/O device
The CM paradigm is
used to associate state-gesture pairs, (sj , gi ), and
thus the O/I, I/O device fits into the common framework that is required by
CMs.
In particular, the
O/I, I/O device can be used to extract knowledge artifacts from a distributed
or virtual environment. This extraction process requires the persistent
availability of synchronous and asynchronous channels, a new memory and context
support technology, and a CM that has adaptive pattern recognition technology.
The decision
engines allow the simulation of knowledge artifact extraction in a Multiple
User Environment. These Multiple User Environments are called MUEs.
In the current
design phase, the decision engines simulate the judgment process through which
a human makes a response, as an input gesture, to a statement that is displayed
as an element of the finite state space, S. These simulations are
discussed in a study of human decision making.
The basic
screen for managing Checking processes is seen in the Checking device figure.
Figure 4.2:
Checking device
Once the CM obtains
a set of responses, the CM allows the user(s) an opportunity to review the list
of responses. The review requires a movement of items from the left list box to
the right list box.
Some items can be
left behind, thus reducing the complexity of the list. Adding a new item to the
left list requires a user to return to the O/I, I/O device. These two design
elements are directed at producing a minimal set of descriptors. A
"use-philosophy" is developed as a consequence of these design
elements.
4.2: Methodology
Our multi step process follows a methodology
defined by a research group working in one of the National labs. The base
methodology, called "Ultrastructure Methodology", is related to both
Knowledge Management techniques and the Nodal Forrest Learning strategy developed
by Prueitt. This methodology has been recently extended to support the creation
and storage of knowledge extraction in MUEs using collective virtual
intelligence mining techniques.
The methodology involves four steps.
During the four steps, particularly the last,
it is possible to identify source material that provides prerequisite knowledge
about each of the topics. Thus the methodology will produce testing material
and curriculum. Curriculum can be properly defined through the enumerative
procedure of the four steps.
The first step is for a knowledgeable person to specify a universe
of discourse related to the target domain. Stating a name and a short
description will do this. (See similar naming and description in creating a new
conference on the O’Reilly WebBoard.)
The second step is to "partition" the universe of discourse
into areas that are as independent as possible. The expression of these areas
follows the same path as the development of "axioms" in formal
systems, such as geometry.
The third step considers, one at a time, each of the areas
identified in the partition of U. Again the critical issue is the
control of focus. We want the knowledgeable person to focus on developing a set
of descriptive phrases that collectively could be used to describe any aspect
of the area in focus.
The fourth, and last step is to take each of the phrases that where given in
the previous step, one at a time and produce question / answer pairs.
Once all areas have been enumerated, then we
need to allow the knowledgeable person to make changes. They will use a
standard add / remove interface object to move over each of the phrases from a
‘tentative list" into a final list. This will require that the
knowledgeable person review the list as a whole. While making this review, the
knowledgeable person may choose to leave out a phrase or two. This can easily
be done.
The purpose of the use-philosophy is to give
the knowledgeable person a justification for building a complete set of topics.
The use-philosophy justifies the fact that it will be harder to add new phrases
to the tentative list, than to remove them.
Iterative refinement is expected. The user
interface can again employ a Communications Manager (CM) to distribute the
process of developing gestures during each of the four steps. The states of the
CM are the phrases that were shown to the knowledgeable person.
Any of the steps may be managed in a
collaborative fashion using the Internet browsers. For example, using MUEs,
many knowledgeable persons can work together to create one or more
question/answer pairs for each of the descriptive phrases.
4.3: Extraction of Knowledge Artifacts in
MUEs.
Most knowledgeable persons will have specific
ways of organizing the discourse. We want to capture this organization as a
"knowledge artifact". Our proposed MUE software, when built, can be
used to capture and refine these "high level" partition elements of
any universe of discourse.
An individual can fill out the topics of a
universe of discourse, and then turn this work over into a Multiple User
Environment (MUE).
The work by Kang Xu on the Generalized Phone
system is an example of this knowledge extraction. We are planning to conduct a
refinement of this artifact as soon as the web version of the Interactive
learning Center is completed.
4.4: O/I, I/O dialog
Because of "controlled randomness",
the software should appear to "dialog" with the knowledgeable person.
The dialog is managed by the CM’s decision engine, and can be simulated by the
Decision Engine.
The set of states in the CM’s decision engine
is the set of prompts that are used to ask the user to list a minimal set of
descriptive phrases. These prompts can be developed in a generic fashion to
support the "drawing" of gestures from a user. This minimal
descriptive enumeration draws from knowledgeable persons the knowledge that
knowledgeable persons have about the area.
The software will select states, in the form
of prompts, and ask the user to supply gestures, in the form of human language
sentences. For example, in step 3, the sentences should be focused on one topic
and one view of that topic. Some of the sentences will become questions, and
related answers, that will be stored in a question bank for later use in
multiple choice tests.
Using the engine, experiments can be
completed that determine how systems will scale in size and how performance can
be monitored.
4.5: Creating test banks
After questions/answer pairs are completed
the questions will be composed into tests and a different CM will use these
test elements as states, sj, at a location, Li. The
decision engine can then be used to simulate the answering of these questions
(states) with answers (gestures). Automatic grading follows in a natural way.
A "three channel-group device"
allows the sharing of profile information between channel groups. Profiles
generally come in three types; key word, semantic net or frames. The notation
that follows assumes that the profile is a key word type profile.
Let us notate the collection C of curricular
units at the level of individual lessons, lk , i.e.,
C = {
lk | k ranges over an index set, 1, 2, . . . , r }
Lessons can be grouped together into units
for testing purposes.
We wish to have a representation of the skill
level of the learner. We propose that this level of skill can be approximated
by an inventory of themes that are expressed in the lessons.
To obtain a computational handle on this
inventory idea, we use the formalism of a themespace.
Themespaces are part of a basic technology
developed by academics and industry, and deployed in most Information Retrieval
systems. They are high dimensional vector spaces defined using the set of theme
words. For example, many of the web search engines use themespace technology.
5.1: Defining the Themespace for a
Curriculum:
For each lesson we take the text and send it
to a word frequency parser or to a natural language parser. The parser output
is processed to produce a set of key words or, in the case of natural language
parsing, a set of theme phrases.
lk
à { t1 , t2 , . . . , t10
}
We use the symbol Tk to designate
the set of themes for the kth lesson.
Tk
= { t1 , t2 , . . . , t10 }
Generally the key words (themes) are ranked
by numerical values. However, for the purpose of the theme profile of a lesson
we will take only the top ten of the phrases and treat them equally. The theme
profile for one lesson defines a 10 dimensional themespace, each phrase
defining exactly one new dimension.
The themespace that we need is the one that
is defined from the set that contains all themes from all lessons. This set is
written in the following way:
U = È Tk as k ranges over an index set, J = {1, 2, . . . , r }
The space so defined is called the universal
themespace for the lessons. Now, note that each lesson defines a point in this universal
themespace. In fact any subset of U defines a point in the universal space, so
any union of lesson profiles defines a point in the space.
5.2: Perquisite order:
In most lesson plans, the lessons have
requisites that should be mastered before starting the lesson. These requisites
provide a partial order to the lessons that naturally place the lessons into a
tree like structure called a lattice.
5.3: Thematic order:
One way of selecting the next lesson to study
is to start at the root of the tree and move towards tree branch endings.
However, it is often that case that user knowledge will be spotty and one would
like to study lessons on a "as needed" basis. In this case we would
like to follow the curriculum by managing the knowledge of which themes have
been involved in priori learning or experience.
Thematic order is a complex subject that can
be approached most simply using elementary set operations on the lesson
profiles. We can define the profile of a user to be the union of the profiles of
the lessons that the user has learned.
U = È Tk as k ranges over an index set, I Ì J.
Determining which lessons have been learned
is done via standard tests.
5.4: Nodal Forest Learning Strategy
The Nodal Forest Learning Strategy was
developed and tested over a period of about ten years while Prueitt was
teaching university mathematics courses. It is based on an itemization of the
topics in a curriculum, and the use of published principles from theoretical
immunology and associative neural networks. The itemization is used to produce
a themespace. The Strategy has a simple implementation that using the voting
procedures, also developed by Prueitt, to produce category and placement
policies defined by the elements from the themespace.
The consequences of using the Nodal Forest
Learning s Strategy is an adaptive presentation of new materials to a learner,
based on thematic content.
5.5: Thematic selection of next lesson:
The user profile defines a point in the high
dimensional universal themespace. Each of the lessons also defines a point in
this same space. Due to the metric in the space, the lessons that the user has
learned will be close to the user profile. This means that, in general, lessons
learned will be in a small neighborhood of the user profile, but as one
gradually increases the radius of the neighborhood one finds the closest lesson
not learned.
Now this may seem a little odd, but the
metric of themespaces can be changed fairly easily. Neural network clustering
or thematic relational logic can modify the procedure, for selection of the
closest-lesson-not-learned.
The same class of procedures can be applied
to obtaining documents from the repository of learning materials that are
indexed by a CM.
5.6: Selection of relevant archived material:
The user profile can autonomously select a
ranked list of materials that is user specific. The learning profile itself can
be used to provide a focus to instruction. However, a "retrieval"
profile can also be easily developed based on the themespace procedures that we
will deliver.
Several profiles for each user can be stored
locally and then used in different circumstances. This introduces the
"ring of modes" to capture different the learning modes of a single
individual. This will be demonstrated.
5.7: Learner loyalty:
The adaptive selection of learning material
and curriculum employs a new paradigm that is being developed in electronic
commerce. The paradigm is based on the notion that the customer should feel
that the "system" treats him or her as an individual. By treating the
customer as an individual, the customer develops loyalty to the system.
5.8: Innovation:
Adaptive technologies are new. We have just
begun to explore simple ways in which profile representation can be adapted by
a user’s behavior. The new technology is interesting to individuals because it
is the individual’s own actions that shape the profiles. The shaping process is
indirect and thus is often surprising. As long as true learning is occurring,
then these surprises can only aid in the overall process.