(soon)
Communicated from Paul Prueitt 12/22/2003 8:50 AM
Form based communication and
computational reasoning
Towards a BCNGroup Communities
Bead Game
Editing completed :
12/22/2003 4:02 PM
Basic notation and simulation research
Adaptive technology design for interactive curriculum
A many-to-one and
one-to-many web based communication manager is required to
facilitate the identification and movement of intelligence in networked
communities. The idea here is that the
thematic structure of social discourse can be somehow extracted from the text
in a web based discussion involving a number of people and then a visualization
of that thematic structure made available as a retrieval mechanism.
Specific types of Knowledge
Management (KM) technologies are required to provide background processes to
assist in minimally structuring the activities of human participants. But these KM technologies/methods need the
underlying thematic analysis that is provided by the InOrb Technologies Inc ORB
engine, and the visualization provided by the OntologyStream Inc SLIP
browsers.
Thematic structure is
generated and visualized. Members of
the community then retrieve documents and locate paragraphs within those
documents that are indicating the various themes. The InOrb technology fascinating this is partially developed at www.InOrb.com.
Beginning in mid 2003, we
began to use the language developed in the OntologyStream Inc Notational Paper:
Notational System for the
Ontology Reference Base (ORB)
Specifically we want to
start using the language on Subject Matter Indicator
neighborhoods. The
SM-Indicator is, when considered in the abstract, a pattern of linguistic
variation that is used in normal language to provide signs to those whom one
wishes to communicate with. Specific
examples of linguistic co-occurrence are at www.dataRenewal.com and www.inOrb.com.
Natural language use is more
flexible than formal language. In
correspondence to this difference, the exact form of the SM-Indicator varies,
and is often incomplete while relying on the tacit knowledge of both the
speaker and the hearer. Again, the
linguistic theory is difficult and abstract, but the results of the use of the
SM-Indicator neighborhoods is to be measured by the satisfaction of the users.
In the most critical
situations, those effecting life and death for example, the ontology reference
have to allow a formative nature that is constrained by human control. The work completed in 2002 on differential and formative
ontology is one way to achieve this high degree of perceptual acuity
using both the qualities of human perception and the capabilities of computer
algorithms.
A formal representation of
linguistic variation in the ORB constructions is then also incomplete and must
rely on human cognitive acuity to make final determinations on exactly what to
be included in the ORB construction.

Figure 1: The CM manages the flow of
text between many users
and a single user (or a
single process).
In the figure above we use a
diagram developed in 1999 for what was then called a three-channel device
(3CD). This work was four years ago (as
of Dec 2003). But the idea is the
same. Many people should be able to
communicate to the group as a whole and then some sort of synthesis of the
social discourse occur. The synthesis
is what we want to stand up in the form of Ontology Referential Bases, with a
visualization of the local topology of this ontology.
In other words, we
understand that the ORB is a high dimensional graph (what this means is
something that needs to be discussed in a formal way). The ORB’s structure, being a graph, can be
viewed as a two dimensional graph (locally) and these local viewing of ORB
structure is the same as the Subject Matter Indicator
neighborhoods.
Decision engines provide a key simulation feature
to be used while the various species of communications managers are being
developed and tested. This is necessary
as a means to test a robust BCNGroup Communities Bead Game. However, the robust simulation work can be
delayed while we integrate the SLIP, Instant Index and ORB technologies.
Before we move on, we should
say that the Decision Engine design is quite simple. The engine will simulate
the pairing of one element of a finite state machine, S, with one
element of a finite state machine G.
S is often, but not always,
interpreted as states of the world that requires response. G is often,
but not always, the set of all possible gestured responses to states of the
world. So the simulation of
interactions within the BCNGroup Communities Bead game can be seen as
human-to-human, human to machine, machine to human, or machine-to-machine.
In some cases, S is a
set of questions and G is a set of answers. In these cases, the paradigm
is exceedingly simple and quite natural to the user, and may be used with
polling instruments. Richard Ballard’s
Knowledge Foundations knowledge base system, the Mark 3, follows this
notion. A focus on questions and
answers allow a reduction in the social discourse to something that is
useful. Of course, not all social
discourse is composed of questions and answers. But this fact should not allow us to ignore the value in question
/ answer paradigms.
In this section, a simple notation
for the data structures in the communications manager is given.
The notation’s simplicity
hides a great variety of supporting technologies, each of which may contribute
to the core functionality of CMs.
The notation and related
data structures provide a surface similarity between different types of
knowledge technologies. By knowledge
technology we are referring to a technology like a typewriter that is used by
humans and which are rather useless if no human is in the loop.
Suppose we have a set of
world states
S = { si | i = 1,
. . .. , n }
and a set of gestures
G = { gj | j = 1,
. . .. , m }
and a location
Lk e{ Lk
| k = 1, . . .. , r }. = L
The decision engine is a
simple simulation engine that randomly selects a world state, si ,
and assigns a gesture, gj ,, thus creating a pair, (si, gj),
at each of a number of locations. At each of these locations, the pairs may be
accumulated and then batch transmitted to a single service center mailbox.
Again, we are not interested
in building the simulation in later December 2003, but rather to give an
indication of how the BCNGroup Communities Bead Game software will work once
the InOrb technologies are fully integrated (we hope in January 2004). The
simulations will come later.
The simulation accounts for
the following types of transactions:
Simple transmission:
1. Decision pairing occurs at locations that are not connected to transmission devices. Decision pairs are then placed into a queue. When a transmission device is available, then the accumulated pairs are sent to the service center.
2. Decisions may be grouped at the location into a series that has a beginning and an end. A transmission of a data record in the form ( j, ("start", "start")) and ( j, ("end", "end")) is made to insert start and end statements into a data base in the service center mailbox.
3. The set of world states and the set of gestures are finite state machines, with perhaps different types of relationships between states in each machine. These relationships, between states, can be sent as metadata.
Complex transmission:
4. In the most abstract form of the theory, posed here, the series is called a "passage". Transmission of multiple passages may be intermingled, as in human conversation. A communication manager that has proper annotation of context must manage this complex transmission. The manager must also have a computational substructure that records the common representation of invariance in the universe of discourse. We propose that this be done with ORBs.
5. Each element in either finite state machine can be associated with a representational form. The representational forms are indications of the casual and logical features of states and are defined to be part of themespaces or concept spaces. These forms are used to encode statistics and reinforcement learning into an "implicit" memory or the past state experience, as suggested by Don Mitchell. The expression of this implicit memory is via voting procedures, and is thus very simple. Implicit memory is expressed via associations made in representation spaces.
6. The transmission of a decision pairing may be either as a data packet or as transformed data, in analogy to the Fourier transform is discussed below.
The decision engine provides
a randomized selection and transmission of decision forms. The transmission may
be simple, in which can no complex processing occurs. If the transmission is
complex, then the randomization is constrained by conditions placed on the
relationships between a selection of elements of the two finite state machines,
as well as on the representational methodology.
Using the InOrb technology,
the data moving into a e-forum can be parsed and some results placed into an
ORB. The e-forum ORB can be visualized
as a Upper Taxonomy. Visualizations
will allow a high fidelity retrieval of SM-Indictors.
The notion of a complex
transmission would provide "interpretive" steps between
locations during transmission. Thus a type of "machine knowledge’
is possible wherein the complex transmission acts to transform a signal into a
spectral domain (the ORB) and then perform an inverse transform from the
spectral domain into the form of a simple transmission.
The identification of useful
patterns requires two essential ingredients. First, the real world must have a
generator that produces an actual pattern that is repeated. This pattern can
then be seen using measurements on co-occurrence of tokens in bit streams. The
second ingredient is specific knowledge of when the pattern begins and when it
ends.
In simple cases, this is not
an issue. For example the co-occurrence of terms in the distribution of word
frequencies, or the co-occurrence of the range in which numerical data falls,
is often within a context that easily establishes the beginning and end of the
event. However, most naturally patterns are complex, incomplete and / or not
properly measured.
During complex transmission,
the CM provides a Fourier like spread of a signal into a specific decomposition
involving the use of a substructural "vector’ basis. The vector basis, a
mathematical notion from Fourier analysis, describes the nature of light by
identifying energy wavelengths in the electromagnetic spectrum. The
decomposition is analogous to wave transformations seen in quantum mechanics.
No one has done this, but
the theory is firmly established. We
need only organizational support and some minimal funding (as of Dec
2003).
In data stream decomposition
of signal, the repeated patterns in the signal are the signals
"spectrum". This signal spectrum can describe the content of the
stream. The ORBs do this whether the
data is text or scientific data. The
iterated update of the ORB as new information is acquired is the task that
Nathan Einwechter and Paul Prueitt and working on as of late December
2003. The InOrb Technologies “product”
is the system that will do this for any e-forum that follows certain formatting
standards (so that the parsing and indexing is easy).
In the ORB theory, derived
from Pribram’s work in cognitive neuroscience, signal spread is followed by
signal processing in a "spectral domain" and then by the inverse
transformation of the signal into a new bit stream. This re-localization is
called, "a collapse of the wave" and is where any
"interpretation" of information must occur. In our context, this is where information from machine
representation of ontology can be mixed with the data structure itself –
providing both linguistic and ontology services to the formation of new small
ontology structures that are highly situational in nature. Again we reference the work on formative and differential
ontology.
"Knowledge" is
regarded as only existing during this collapse, a collapse involved in the
formation of the mental event experienced by a human. The CM follows this
analogy in managing the complex transmission. The theory is grounded in
neuropsychology and in the widely available experimental evidence regarding the
processing of the flow of energy from the eye into brain regions.

Figure 2: Simple and Complex
transmission of data streams
In simple transmission, no
processing of the data stream is allowed. The data transmission is said to be
Newtonian and simple. In complex transmission, a "sign system" is
created that allows the "cross level" decomposition of the meaning of
specific information in specific contexts and having specific pragmatics. The cross level process is to be performed
using the Prueitt Voting Procedure. This voting procedure has been adequately
studied in 1999 and before, but again is something that is not immediate on our
development and deployment path (as of December 2003).
The sign system also
provides structured annotation of context, and thus may shape the
interpretation during the re-localization of information. The method for
ambiguation/disambiguation is developed for this purpose. If memory is available, in the form of a
class of representations of substructural patterns, then the stratified many -
to – many communication theory proposed by Prueitt is realized.
Traversal of an information
gap, generically called epistemic gaps in the literature, require either a
forward transformation or an inverse transformation of the signal. It is assumed
here that interpretation must involve the traversal of an epistemic gap. Once a
data stream is decomposed into semantic invariance, various computational
argumentations can occur in a spectral domain built from theme and / or concept
spaces. The semantic invariance may be statistically defined, as in the Dynamic
Reasoning Engines (DREs) available from the company Autonomy Inc. The
computational argumentation may be defined using quasi-axiomatic theory, Mill’s
logic, and a class of procedures related to "voting procedures".
Computational argumentation,
in the substrate, may change the relational linkage in ORB space leading to
different event chemistry.
Recomposition of ORB structure based on these changes may use voting
procedures to perform the inverse transform.
The new ORB structure will have well-established similarity and
dissimilarity to the original data.
This is following the so-call Mill’s logic. The details will vary but this is the essence of the idea.
Ultimately, the natural
objective of a knowledge extraction methodology is to produce a set of topics,
perhaps organized into taxonomies. The set of topics is to be as complete as
possible while respecting the content within areas that correspond to
viewpoint. To respect the viewpoint, established by context, each area can to
be treated separately. Reconciliation of terminology linkage difference,
between various contexts and community viewpoints, can be managed with a
technology like the SchemaServer from SchemaLogic
Inc. The measurement of
consistency and completeness is made within areas and not across context.
A specific methodology is
introduced here and is related to both the BCNGroup Communities Bead Game
software design specification and the
Knowledge Technology Toolkit for Kids CD.
In this section we provide
communication theory when the states of the world are regarded as questions and
gestures are regarded as answers.
Software specs (2-21-99)
Develop a table and screen
to create a new universe of discourse, U. Each new U should have
a unique identifier
The methodology involves
four steps.
1. Name and briefly describe the universe of discourse.
2. Partition the universe in a small but complete number of areas of discourse.
3. For each area, enumerate in a descriptive fashion a list of topics. The list should be complete and each element should be as independent as possible from any of the other topics in that area.
4. For each topic, create one or more question / answer pairs.
During the four steps,
particularly the last, it is possible to identify source material that provides
prerequisite knowledge about each of the topics. Thus the methodology will
produce testing material and curriculum. Curriculum can be properly defined
through the enumerative procedure of the four steps.
The first step is for a knowledgeable
person to specify a universe of discourse related to the target domain.
The second step is to "partition"
the universe of discourse into areas that are as independent as possible. The
expression of these areas follows the same path as the development of
"axioms" in formal systems, such as geometry.
The third step considers, one at a time,
each of the areas identified in the partition of U. Again the critical
issue is the control of focus. We want the knowledgeable person to focus on
developing a set of descriptive phrases that collectively could be used to
describe any aspect of the area in focus.
The fourth, and last step is to take each of the
phrases that where given in the previous step, one at a time and produce
question / answer pairs.
Once all areas have been
enumerated, then we need to allow the knowledgeable person to make changes.
They will use a standard add / remove interface object to move over each of the
phrases from a ‘tentative list" into a final list. This will require that
the knowledgeable person review the list as a whole. While making this review,
the knowledgeable person may choose to leave out a phrase or two. This can
easily be done.
The purpose of the
use-philosophy is to give the knowledgeable person a justification for building
a complete set of topics. The use-philosophy justifies the fact that it will be
harder to add new phrases to the tentative list, than to remove them.
Iterative refinement is
expected. The user interface can again employ a Communications Manager (CM) to
distribute the process of developing gestures during each of the four steps.
The states of the CM are the phrases that were shown to the knowledgeable
person.
Any of the steps may be
managed in a collaborative fashion using the Internet browsers. For example,
using InOrb technology, many knowledgeable persons can work together and the
InOrb technology will create question/answer pairs for each of the descriptive
phrases.
Most knowledgeable persons
will have specific ways of organizing a discourse. We want to capture this
organization as a "knowledge artifact". The InOrb knowledge
management software, when built, can be used to capture and refine these
"high level" partition elements of any universe of discourse.
An individual can fill out
the topics of a universe of discourse, and then turn this work over into a
Multiple User Environment (MUE).
After questions/answer pairs
are completed the questions will be composed into tests and a different CM will
use these test elements as states, sj, at a location, Li.
The decision engine can then be used to simulate the answering of these
questions (states) with answers (gestures). Automatic grading follows in a
natural way.
A "three channel-group
device" allows the sharing of profile information between channel groups.
Profiles generally come in three types; key word, semantic net or frames. The
notation that follows assumes that the profile is a key word type profile.
Let us notate the collection
C of curricular units at the level of individual lessons, lk , i.e.,
C = { lk | k
ranges over an index set, 1, 2, . . . , r }
Lessons can be grouped
together into units for testing purposes.
We wish to have a
representation of the skill level of the learner. We propose that this level of
skill can be approximated by an inventory of themes that are expressed in the
lessons.
To obtain a computational
handle on this inventory idea, we use the formalism of an ORB themespace.
Themespaces are part of a
basic technology developed by academics and industry, and deployed in most
Information Retrieval systems. They are high dimensional vector spaces defined
using the set of theme words. For example, many of the web search engines use
themespace technology. The ORBs as a
new and simple means to view the themspaces.
Defining the Themespace for
a Curriculum:
For each lesson we take the
text and send it to a word frequency parser or to a natural language parser.
The parser output is processed to produce a set of key words or, in the case of
natural language parsing, a set of theme phrases.
lk is an element
of { t1 , t2 , . . . , t10 }
We use the symbol Tk
to designate the set of themes for the kth lesson.
Tk = { t1
, t2 , . . . , t10 }
Generally the key words
(themes) are ranked by numerical values. However, for the purpose of the theme
profile of a lesson we will take only the top ten of the phrases and treat them
equally.
The themespace that we need
is the one that is defined from the set that contains all themes from all
lessons. This set is written in the following way:
U = Union of the Tk as k ranges over
an index set, J = {1, 2, . . . , r }
The space so defined is called
the universal themespace for the lessons. Now, note that each lesson defines a
point in this universal themespace. In fact any subset of U defines a point in
the universal space, so any union of lesson profiles defines a point in the
space.
Perquisite order:
In most lesson plans, the
lessons have requisites that should be mastered before starting the lesson.
These requisites provide a partial order to the lessons that naturally place
the lessons into a tree like structure called a lattice.
Thematic order:
One way of selecting the
next lesson to study is to start at the root of the tree and move towards tree
branch endings. However, it is often that case that user knowledge will be
spotty and one would like to study lessons on a "as needed" basis. In
this case we would like to follow the curriculum by managing the knowledge of
which themes have been involved in priori learning or experience.
Thematic order is a complex
subject that can be approached most simply using elementary set operations on
the lesson profiles. We can define the profile of a user to be the union of the
profiles of the lessons that the user has learned.
U = Union of the Tk
as k ranges over an index set, I subset of J.
Determining which lessons
have been learned is done via standard tests.
Nodal Forest Learning
Strategy
The Nodal Forest Learning
Strategy was developed and tested over a period of about ten years while
Prueitt was teaching university mathematics courses. It is based on an
itemization of the topics in a curriculum, and the use of published principles
from theoretical immunology and associative neural networks. The itemization is
used to produce a themespace. The Strategy has a simple implementation that
using the voting procedures, also developed by Prueitt, to produce category and
placement policies defined by the elements from the themespace.
The consequences of using
the Nodal Forest Learning s Strategy is an adaptive presentation of new
materials to a learner, based on thematic content.
Thematic selection of next
lesson:
The user profile defines a
topological neighborhood, which can be matched to the Subject Matter Indicator
neighborhoods in an ORB. Each lesson
has neighborhoods related to it. Due to the metric in the space, the lessons
that the user has learned will be close to the user profile.
This means that, in general,
lessons learned will be in a small neighborhood of the user profile, but as one
gradually increases the radius of the neighborhood one finds the closest lesson
not learned. This provides a order of
presentation to the material which is adaptive to the learning experience of
the learner.
The same class of procedures
can be applied to obtaining documents from the repository of learning materials
that are indexed by the elements of the ORBs.