Notational
Foundation to Future Semantic Science
Paul
Stephen Prueitt, PhD
Stratification into layers
delineated by time is observed within physical processes. Metabolic processes, for example are organized
within what might be referred to as a layer, separate from the behavioral
intentions of the living system. The
language to talk about stratification of processes and the interaction between
layers of stratification has been largely missing within our scientific
literatures. A notational foundation is
offered as a means to set the stage for developing language of this type.
Section 1: Our justification
for using a stratified model
Formative event models and stratification
Measurement using the word level n-gram
Generalized n-grams, frames and scripts
Simple and Differential Convolution
Localization and Organization Processes
Our
stratified model uses a process for collapsing occurrences into categories to
create persistent data structure and relationships. Observation leads to structural information about various targets
of observation.
The
stratified model is not one that can be modeled by classical mathematics
alone. The model reflects how physical
components work together to express real world complex behavior. The concept of complexity is defined as
something that cannot be reduced to algorithms, the so call Rosen complexity. [1] The point of Rosen’s definition of
complexity is that many aspects of natural systems are not modeled using
Hilbert-type mathematics. These aspects
include free will, intentionality and the natures of memory and anticipation.
In
our view, modern science suggests that natural processes are not truly
computational. In our view there is
causation lying outside of any formal model that may govern transitions in any
natural system. In spite of this, a
great deal of actual knowledge can be acquired about any natural system. For example, any natural system depends on
the emergence of structural configuration to fulfill functions necessary to
that system.
The
concepts about coherence and what is a system are relevant. In our view, stratification is a means to
separate systems from environments.
Stratification is also nested in nature, and this introduces a natural
complexity that most people cannot see immediately. The mix of stratification and nested-ness leads us to talk about
a relative stratification, when nested structure has different stratified
layers, but that at any time scale one finds the organizational features found
in other places, within that time scale, within other nested structures. [2]
These organizational layers are encapsulated within organisms, but still share
information as if non-locally connected.
In our view this connection is in the natural categories that emerge and
reside in the organizational layers.
One
result from this new viewpoint is the tri-level architecture for human-computer
interfaces and information management. [3]
In the tri-level architecture we develop category and structural knowledge
about the invariance across many things (memory), and we develop category and
predictive knowledge about global processes that are being driven by emerging
signal between complex systems. [4] An example of the biological sciences
related to emerging signal expression is seen in the science of gene and cell
signal pathways. [5]
A
stratification of simulations about the natural processes has to be
justified. One way to justify the
stratification of processes is to demonstrate a new information science in the
marketplace. However, the
capitalization mechanism of the last part of the twentieth century and the
first part of the twenty-first century are controlled by the old information
science. So the capitalization of not
only stratification theory, but also of related innovations has not been
allowed. Until capitalization process
give the stratified theory a fair test, this justification must examine the
limitations to formal advanced mathematics.
In this examination we make the argument that far less is being done on
critical economic/environmental/social balances that could be.
In
the mainstream of science there are assumptions made regarding the universality
of mathematics. However, the processes
involved in human awareness have not been shown to be algorithmic in nature. [6]
We have the view that the abstraction and assumptions that science uses, until
now, may be missing some essential aspect.
Specifically, the nature and foundation of Hilbert mathematics may not
be able to reflect many properties involved in human use of information.
We
will make some conjectures about the nature of non-locality in physical
systems. In making these conjectures,
we are suggesting that the limitations in Hilbert mathematics, related to
determinism; and the limitations in classical logic, related to inference and
induction; and the limitations due to not having a formal theory of
non-locality; are all different manifestations of the same limitation. We will be offering new language that
attempts to make the underlying reality clear, and to thus lift the limitation
through extension to logic, mathematics and physical science. The extensions are made via an examination
of physical coherence, logical inference and physical systems that have layers
of organization (delineated by time scale).
Is
the discussion above relevant to modern computer science and the current mess
in information systems? We make the
argument, simply, that yes this discussion is relevant. First, information as defined in computer
science is not the type of “thing” that biologists talk about when they talk
about cell signaling of information between various parts of a cell.
The
limitations in Hilbert mathematics are seen when one attempts to understand the
interpretative and perceptual acts of humans.
Hilbert mathematics plays a modeling role when one is dealing with
engineering and certain categories of physical phenomenon. As such, ontological modeling has the
potential to extend classical mathematics.
The nature of ontological models and a comparison to Hilbert mathematics
and classical logic is developed in the “Foundations”.
The
notation that we offer is simple, but implicitly recognizes that the
ontological model is incomplete without a human acting in an informed
fashion. The solution may be to create
open form formalisms in a specific fashion, using a stratified system of
symbols and
The
completion of a formative ontological model occurs when a human becomes aware
of some information. The human makes
the model into a complex process through the act of interpretation. This means that the model itself cannot be
subject to the types of considerations that are implicit in strongly logical
systems, such as the W3C supported description logics.
Our
viewpoint is that these works on description logics, which is mainstream and
highly funded, is a variant of artificial intelligence discipline and has moved
very far away from ground understanding of natural science.
We
can examine this further. In classical
mathematics completeness and consistency plays a critical role, one that is
addressed by pure mathematicians and logicians. This work forms the core of scientific literatures. In the mainstream academic disciplines, the
work by Godel is marginalized. Also on
the margins is the notion of explanatory coherence. Underlying the work by Paul Thagard [7]
on explanatory coherence is the notion that inference depends on
“coherence”. Such work on explanatory
coherence is typically regarded as non-mainstream. As we move a bit further away from classical formalism we find
the modern academic fields of neural networks, genetic algorithmic,
connectionism, and evolutionary programming.
These academic disciplines all put some type of intellectual pressure on
the concepts of completeness and consistency found in classical mathematics.
It
is our viewpoint that the failures in mathematics, and computer science, can be
partially accommodated using what we have called “stratified theory”.
The stratified model has two forms
(1)
conceptual and notational and
(2)
implementation as computer processes.
In the tri-level architecture for computational
intelligence, our stratified model motivates the use the co-occurrence of parts
of words, words and phrases as indicating functional roles. A formalization relating the parsing of
words by algorithms and co-occurrence patterns is expressed in our notational
system. Parsing finds patterns and sues
encoded structural information to produce organization at one level. Higher order patterns then form from an
aggregation of this encoded structural information. The meaning of these high order patterns is then subject to
interpretation. Over time, the encoding
of interpretation, to the degree possible, results in a knowledge base that
then can be used to assist in future information gathering and future interpretations.
We say that certain of the concepts are motivating a
notational system. This notional system
is the basis for what we are calling the “.vir” subnet standards. The “.vir” standards are designed to allow
certain types of information structures to pass quickly from one processor to
another, within a grid architecture.
The standards also align with basic research on how humans interact with
information.
Stratification gives emergence context. For example ambiguation/disambiguation
addresses issues of complexity directly.
In linguistic systems the points of complexity are where there are
specific word relationships that cause interpretation depth. Interpretation is situational. Function-structure relationships need
ambiguation and disambiguation as part of differential responses based on what
is available to respond with and what needs to be accomplished. At the cell and gene expression levels the
function-structure relationships are to be understood using quasi-axiomatic
theory and qualitative structure function analysis. These methodologies are not difficult to follow. [8]
Computational knowledge representation should take
into account an under constraint that allows choices to be made at times when
an aggregation of substance is emerging to address a specific function. For purposes of what is often called
“semantic extraction from text”, the representation should balance the
limitation of computer technology with human in the loop influence and control.
In natural settlings the emergence of function from
the aggregation of substructure passes through choice points. At these points, in both time and space, the
specifics of environmental conditions, such as the distribution of actual
substructural elements, forms a type of negotiation over actual synthesis of
function. It is suggested that during
this function-structure negotiation nature builds symmetry inductions and that
these symmetry induction produce secondary consequences related to the creation
or modification of natural category.
Actionable intelligence depends on organizational
information to assists in decision-making.
We developed the nine-step actionable intelligence process model (Figure
1) in 2002, by modifying a seven-step actionable intelligence process model
that was widely discussed, and used, in the American intelligence
community.
The intelligence community’s seven-step model left
out two aspects. With a structural
disconnect between measurement and the formation of natural category, the seven-step
model is incomplete. The nine-step actionable intelligence process model is
dependant on the development of “under-constrained” ontological models and the
active participation of humans in real time synthesis of information. During the development of a situational
model, the participation by humans brings external knowledge into a
re-enforcement of the model.

Figure 1: The Actionable Intelligence Process Model (AIPM)
To
be clear we recognize the value of the classical notions of logic and formal
systems. The classical foundations of
logic serve us well, up to a point. But
this foundation is absent a complete understanding of perceptual measurement,
and the physical properties related to those physical phenomena. The emergence of new category and
modification of existing category is core to the essence of intelligence.
A
more generalized statement can be made; indicating that the paradigms available
to intelligence agencies and the department of defense are based on an
incomplete analysis about what computer algorithms can do. The so-called intelligent agents, funded
heavily by DARPA, do not have perceptual interfaces to the real world, as do
living systems. Heavy funding invested
the contractors and the government in a model that would not work well in most
situations.
The
nine steps is a process model. To
instantiate this process model we needed an ontology modeling standard. In the late 1990s I developed the notion of
a referential base. In a referential
base, of informational bits, computational representation of information is
treated as being incomplete, and requiring of additional constraints from a
measurement of what the information is referring to.
There are both algorithmic and non-algorithmic
processes. The referential base
supports analysis based on a direct and precise instrumentation and measurement
of structure in data. This measurement
occurs as part of an instrumented system for mechanical control of
environmental systems or for a system designed to represent the concepts being
discussed within a community or community of communities.
We have developed a cyclic process that produces a
specific discrete model of concepts being expressed within communities. The representation is of co-occurrence
patterns that, when perceived by a knowledgeable human, provide a clear,
complete and consistent perception.
The model of concepts being expressed involves a
model of relationships between classes, objects, and between class:object
pairs. One can see that a large number
of enumerated object-to-object relationships can be thought of as one layer of
organization to human communicative acts in real time. The class-to-class relationships can be seen
as the essence of how human language is formed and used. The class to object paring allows
information to be encoded about how specific communicative acts, in real time,
are building or dissolving categories of meaning as understood commonly by a
community of humans involved in communicative acts.
These two processes become integral and situational
when humans are involved. A specific
model is produced in a similar fashion as a relational database model, except
the data is designed to be stored as simple separate bit structures, as opposed
to within a specific fixed schema, thus allowing organizational process to
express the data with greater flexibility.
The simple bit-structure is related to two classes of innovations that
are integrated in the “.vir” standards, using specific innovations related to a
stratification of category formation and the aggregation of semantic primitives
into single coherent models.
Our
group holds that discrete analysis, about reality, must involve a sophisticated
use of both machine ontology and human based reification cycles. The machine ontology has to be both very
simple and not yet organized into rigid constructions.
The
role of the AIPM in developing proper ontology and reification processes was
first seen in the application of referential systems to text
understanding. Our Orb (Ontological referential
base) based text analysis involved the measurement of co-occurrence and
frequency of terms, phrases and patterns in text. Using the Orb data constructions, classical techniques from
knowledge discovery in text technology were combined with advanced linguistics
and ontology services. The result was
that the referential system inventoried meaningful variation in text. Human annotation was then allowed to
annotate the patterns and invariances.
Stochastic
methods are used to organize information from large data sets. A huge academic literature exists on this
subject, and there are very large funded research programs dedicated to these
methods. Two types of stochastic
methods exist to identify linguistic variation closely associated with subject
matter indicators. Latent semantic
indexing and probabilistic latent semantic indexing is one type, though these
are really quite different. Hidden
Markov Models is the other type.
Regardless of the type of method used, the result can be encoded into a
simple notational formalism.
In 2001, we developed a specific methodology for translating the results from any stochastic methods directly into a set of ordered triples having the form
{ < a, r, b > }
Using this representation, it is really simple to
provide rapid computer processes for subject matter retrieval over large data
sets.
The ordered triples can be encoded into computer
memory using three hash tables, with the set of first elements, and set of
second elements and the set of third elements being made into the hash keys,
and the related data being placed into the hash container. This encoding is discussed below. However, it is important to note that a
slight modification of the hash table moves the technology into an area that
seems dominated by the Gruenwald patents on representing the text string as a
base 64 number [9]. When this happens we have a key-less hash
table and an efficiency change in one to two orders of magnitude.
We enter a mystery and come out understanding
something that seems magical in nature.
The magic is two fold. First the
entire information space is easily encoded as a simple low-resolution image
file of a few hundred thousand bits.
Second, the search and retrieval function is almost instantaneous due to
the holonomic feature discussed by Nan Gelhard and myself. [10]
The key to understanding critical scalability issues
is to see empirically that the size of the set of primitives, the first, second
and third elements; is limited by a categorization process that collapses
occurrences into operational categories.
From large data sets there is a type of compression of data into
structural relationships. This
compression builds categories and after a while all categories are found,
unless the data set itself changes.
In my work the problems related to organization of
very large data sets develops an index in the form { < a, r,
b > }. Each of these hash tables are ordered using the
key-less index, where a point in this finite 3-D Hilbert space corresponds to
individual triples. The index takes on
an organization using mathematical operations that are well understood and
which do not involve statistical inferences. A set of categorical definitions and
relationships between categories is mapped precisely to a finite 3-D Hilbert
space.
Moreover, the index itself can be separated from the
data and further organized into an ontological model about the meaning that the
data may have in various contexts. In
this way, ontological models of social discourse can be developed. The two fold magical qualities are preserved
by the social conventions encoded in natural language use. The same techniques may also, in theory, be
applied to various types of data such as data derived from scientific
instrumentation. The point is that the
Orb notation condenses the output from the many data mining and data harvesting
systems, resulting in a well-specified structure and having certain formal
properties related to the discovery of function of what is observed
algorithmically. Orb notation will act
like a standard common integration of the acquisition of data structure. Also, since the result of computational
processes is close to a well-specified ontological model, existing ontological
models can be used as a pattern recognition system, and for other
purposes. Foundational research on
brain and behavior may suggest that actual brain behavior systems may work
based on a very similar principle. [11]
One of these purposes is realized if the ontological
model is represented as a Topic Map.
The actionable intelligence cycle, discussed in the previous section,
acquires data from databases or from some measurement process. Humans are able to see organizational
structure and to use existing ontological models as well as personal insight to
produce quality models of whatever is the target of the cycle. The Topic Map
standard is then used to create a well-organized index into information
structures, even those structures that were not involved in the initial
measurement. This cycle follows the
action-perception cycle found in living interactions with environments. All data non-interoperability issues never
arise.
The existence of ontological models brings up the
question of interfaces between the output of statistically based analysis,
linguistic analysis and human annotation of these results. Using ontological modeling, the computed
results are deterministic and can be controlled by a human to achieve a fine
resolution over an event space. In
theory, the encoding of co-occurrence leads to a type of scientific
investigation of phenomenon that is complex in the Rosen sense. I have elsewhere talked about Rosen
complexity as being a necessary consideration when modeling living systems and
do not wish to diverge to far from the initial discussion of the Orb
representation. The point to make is
that ontological models, of the type I am describing, can serve in the same
role as Hilbert mathematics. The
difference is that Hilbert mathematics does not capture the essential
non-algorithmic nature of certain aspects of living systems. The similarity is that there can be an
induction of a well-specified model by a human mind and the externalization of
individual observations into that model.
The model can then be shared between members of a scientific
community.
If people act in a way that leads to these types of
models and people use such models to communicate structural information, then
they act in an objective fashion even if they are concerned with phenomenon
that has not had successful Hilbert mathematics modeling.
The new ontological models also have a reasonable
expression within distributed communities.
A co-occurrence measurement phase in the action-perception cycle has a
simple encoding mechanism that uses hash tables to encode informational
bits. The bit structure expresses the
form (class, object) where the class gets its definition from what is a
stochastically defined neighborhood “around” all of the occurrences of the
object, and the object gets its definition from a specific occurrence of a
neighborhood. These formal
constructions allow the human to add the required complexity to a symbol system
through the process of annotation and interpretation.
In the next section we give a complete set of
elementary formal indicators.
We may define relationships between objects:
<
o(j), r, o(i) >
where
o(i) and o(j) are objects from a set O = { o(i) | i is over an index set
}
We
may separately define relationships between classes:
<
c(k), r, c(j) >
where
c(k) and c(j) are classes from a set C = { c(i) | i is over an index set
}.
Classes
and objects have relationships related to categorization. Without imposing any type of classical or
modern description logics, these categorical statements can be encoded into the
holonomic structure discussed in the previous section. The non-imposition of logics is part of the
removal of formal semantics from the ontological models based on Orbs. This removal of semantics then allows the
real time imposition of “meaning” via human interpretation. The interpretation is evoked by standard
construction Topic Maps acting as a sign system. [12]
Consistent
with the stratified organization of categories, we may define relationships
between class:object pairs (nodes):
<
n(k), r, n(j) >
where
n(k) and n(j) are nodes from a set N = { n(i) | i is over an index set }
In
addition to the class object distinctions we find it useful to have notation
for:
a
set of atomic constructions
A = { a },
and
a set of compound constructions
C = { c },
The
relationship between atomic construction and compound constructions may be used
to reify instance:class information. It
was proposed, starting in the mid 1960s, by the cybernetics school of Pospelov
and Finn that a quasi-axiomatic inference apparatus allows the development of
structural knowledge about the formative process expressing as function from
the aggregation of substructure. [13] This work was literally unknown in the US
until the Army Research Lab conferences starting in 1994. [14]
My
notation allows nesting of structure:function information. For example, atoms can be represented as
class:object pairs. The atom object is
a simple occurrence, but in some nested cases the atom object would be an
occurrence of a category. The atom
class is the invariance at the level of organization that is the substrate to
the categories in a level of organization one layer above in the nested
structure. This nested structures are
self organizing in nature, and also in the stratified ontological structures based
on the Orb notation . The formation of
a simple object compound is then shaped by the function required by a complex
(living) system. This compound is then
“participatory” in the definition of a category of compounds expressed from the
occurrence of class compounds at a lower level of organization.
My
notation is original work that is designed to provide an provable optimal
information encoding standard for developing simple Topic Map interfaces to
emerging ontological structure based on action-perception cycles involving
human to computer interactions.
The
annotation, by humans, of the observed meaning/function of compounds can be
derived from empirical analysis similar to the scientific method. The stratified model allows us to do the
bookkeeping about category formation over time and at multiple levels of
organization (of the same reality) as expressed in real time.
Orb constructions can, therefore, play the role, in complex
control and analysis, that Hilbert mathematics plays in engineering science.
An
interpretive act is involved in human awareness of information. Of course, computers have no similar
function. An objectively observed
correspondence between word co-occurrence and subject matter experienced by
humans is essential to design interfaces.
The
process in which language is created is seen as part of an ecological process
where patterns of co-occurrences come to make sense to a human as part of
social conversation or as part of the reading experience.
Looking
at a general and abstract model of mental event formation processes further
grounds a theory of process stratification.
The mental event is seen as the central phenomenon that we refer to when
we think about the experience of subject matter when text is read. But the formation of natural language within
community occupies a similar important position in the theoretical
framework. The mental event occurs
within its own world and is yet is separately influenced by the human
community. The stratification model
allow complexity to be part of the model.
We
can use co-occurrence between significant words and make the observation that
certain co-occurrence is predictive of subject indicators. The whole is then regarded as a composition
of elements that are abstract representations of occurrence of significant
words across multiple instances.

Figure 2: The graph neighborhood with
center at the word “attack’
The
net of significant words is expressed as a graph and a topology is developed on
this graph with “neighborhoods” having centers on significant words and
non-center elements precisely those significant words that are actually
co-occurring within the text under study.
The co-occurrence related to each significant word is visualized as this
graph neighborhood.
An
abstract model of emergence brings one to fundamental physics and to the
phenomenon involved in the physical emergence of something. Obviously this category of phenomenon has
been difficult to model formally using Hilbert mathematics. This issue is discussed well by I. Prigogine
in his book, End of Certainty. Pribram
has also developed a certain presentation of what he has called scientific
realism, where a underlying theory of thermodynamical structural constraints
are involved in the emergence of chemical compounds and in the emergence of
field coherence, as a general principle.
In our opinion, Gerald Edelman makes a similar presentation in his book
“Neural Darwinism”.
The
differential ontology framework is based on this same type of scientific
realism. In differential ontology a
“semantic” compound is composed of (co-occurrence) relationships between
significant words. Because these
co-occurrence relationships repeat in many contexts, they become identified as
an invariant across these multiple occurrences. A reality, out there, is necessary in order that the invariants
have situations in which to aggregate.
This aggregation process is not fully constrained by known natural law,
and thus the function of an aggregation is underconstrained.
The
collection of all co-occurrence relationships that includes the center word is
treated as a single “subject indicator”.
These neighborhoods can be corrected for words having more than one
meaning, and the neighborhoods can be incomplete. The function of an aggregation of invariances is
underconstrained, while at the same time the structure of the aggregation
required to fulfill a specific real time function is degenerate in precisely
the fashion discussed by Edelman.
The
differential process, in the abstract, can be thought of as a model of event
formation. Using a hypothesis called
the process compartment hypothesis, the parallel between a purely algorithmic
process and natural processes is exposed (Prueitt, 1995). As a result new
computing processes can be developed both as fundamental cognitive science and
new computer algorithms.
Consider
the case where atoms are class:object pairs and these pairs are regarded as
graph nodes. Relationship between nodes may be derived from class and/or object
relationships. For example consider,
< n(k), r, n(j)
>
if
the object in the node n(k) is the object in the node n(j). In this case the relationship is some type
of categorical equivalence. The two
objects may be different, but are regarded by the ontology as being the
same.
A
collection of the relationships between nodes can always be rendered as a
graph. The categorical equivalence
collapses more than one node into a single node, as illustrated in the figure
below.

Figure 3: Convolutions over sets of nodes may collapse into a category
In
Figure 3, we are illustrating the convolution over three occurrences that are
deemed to be within an equivalence class.
This illustration might be particularized if in three cases, a single
word stem was co-occurring in specific text.
The convolution would bring these three nodes together as a single
representation.
A
graph is a set of nodes and connections between some of these. The set of nodes are indicated, enumerated,
as a set { a(i) }. One can use very
elementary constructions from graph theory to encode specific, precise and
exact information about any of a number of measurements.
The
knowledge representation problem is challenging. Human sensory and cognitive
acuity measures type and differentiation of type in the development of specific
knowledge, awareness and anticipation.
But cognitive function does not follow classical logic. The specific failures of logic seem
clear. Classical logic has not been
shown to have the capability to deal effectively with natural
function/structure phenomenon.
Orb
analysis produces a single relationship, co-occurrence, r, of two
words, a and b, within a certain specified
proximity produces the exact and precise measurement “ < a, r,
b >”. A document
collection is processed and then visualized.
The result is the Orb, whether represented as a graph or as a set of
order triples:
{ < a, r, b
> }
A
semantic topology is used to “cover” the subject matter with topological
neighborhoods having the center of the neighborhood an element of an upper
taxonomy or controlled vocabulary. Each
of the neighborhoods is presented to humans for annotation.
Before
a connection can be made, one needs to have things to connect. In the Orb notational system, we call these
“things” either atoms or compounds.
Again, if we have a theory of relational type, then relational types can
be atoms or compounds in a theory of relational type. But we have only one type of relationship, co-occurrence. The withholding of any theory of type
accomplishes three important results.
1) Meaning
is not assumed to be captured in the precise and exact measurement of word
occurrence. Because it is not assumed
to have occurred, one is able to make it clear that the knowledge technology
based on Orbs requires active human reification cycles to be properly used
2) The
precise and exact structure of co-occurrence can be restricted to a small
number of key terms. The Orb projection
from a largest Orb, where all terms are considered significant, to a smaller
and visually accessable Orb is done in a single pass over the Orb
structure.
3) What
is called “mutual induction” is supported.
In building Orbs we can define algorithms on the
simple ASCII text list of ordered triples, or on the graph structure.
A correspondence is made between a specific
class:object pairing and a node. We
should be clear here, that simple co-occurrence does not really depend strongly
on naturally occurring class:object structure.
Structural patterns are measured precisely because the co-occurrence
structure is there in the text or in the data.
The patterns have to be understood.
Benchmarking on cyber intrusion data demonstrated a fractal compression
of data into information structure simply due to the collapse of categories
into single constructions. Fractal
compression means two things,
(1)
self-similarity
is observed at multiple levels of scale, and
(2)
after
an initial period, the rate of growth over the size of the encoding mechanism,
an Orb, begins to decrease and becomes very nearly zero after a while
Various implications follow from these two features
of fractal compression. One of these is
fractal scalability. The second is that
the identification of event structure can tolerate measurement error and
incompleteness.
The formation of classes has two parts, the
structural and the functional. These
parts are kept separate so that human intuition can play in situational
judgments, in real time. The exercise
of this judgment when in conjunction with Orb processing is called
“mutual-induction”, and creates both a type of inference and work product
having a standard form.
The mechanisms supporting mutual induction
bring human tacit knowledge into the work product. The visual form of the Orb pattern invokes an induction of some
type of mental experience.

Figure 4: Subject Matter
Indicator neighborhood within the 1997 – 2003 FCC public rulings
What mutual induction is acting on can be simple, or
more complicated depending on the nature of the problem at hand.
The simple co-occurrence relationship is encoded as
a link between two nodes. Taken alone,
a class:object pair may be a node without (necessarily) having any links to any
other node. This happens to be somewhat
uninteresting, when compared to a rich theory of semantic type. We assume that the use of advanced theories
of semantic type cannot be accomplished without a knowledgeable and informed
human in the loop.
The
connections between terms in language expression are not always best measured
from the co-occurrence of terms in text.
But it is difficult to come up with some other way to lay down a basic
measurement process that is simpler and yet achieves such a high level of
success. The understanding of text by
humans involves most, if not all, of the capacities of the human perceptual and
cognitive systems.
With
Orbs, the computer technology works in a precise fashion to create a retrieval
of those documents with specific co-occurrence patterns that are visualized in
the local subject indicator neighborhood.
The
development of co-occurrence connections is not ultimately the only objective of
our work. Our objective is to develop a
representation of the flow of knowledge within a social system. This means that the currency of human
knowledge exchanges have to be detected, inventoried and then various theories
developed that allow one to judge, in an automatic but modifiable fashion, when
specific elements of this currency are being expressed.
The
Orb technology simplifies what one attempts to do with text understanding
systems. Humans already have this
ability to understand text and to communicate with each other. What we need is not to “understand the text
using machines”, but rather to develop a detection capability that targets
patterns in expressions. These patterns
can be found without there being any understanding, claimed or otherwise, by
the computer. Then when the patterns
are presented to any user, the meaning is immediate. Mutual induction occurs and produces a mental event. The mechanisms involved in the production of
this mental event are not exclusive to the computing machine.
The
Orb technology can be embedded into a graphical user interface that supports
annotation, various manipulations of data and other features expected from the
fully operational software.
The
technologists in our group have developed some techniques for mapping
co-occurrence and for reifying, or making human-like, the linguistic variation
as types and expressed as visual symbols.
Slightly different methods will be proposed depending on if we are
developing a memetic expression detection system or a knowledge management
system based on general framework theory.
The methodology for properly developing and using Orbs is discussed in
the next section.
Discrete
analysis using Orbs allows organizational process to express class:object data
in various ways. This type of analysis
can be done about any phenomenon.
When
the phenomenon is complex, i.e., having a least one non-deterministic state
transition; then the encoding of discrete analysis as ontology is a reasonable
way to achieve objective representations.
The analysis is localized into a formation of categories and patterns of
categories. Classical text analysis
initially involves the measurement of co-occurrence and frequency of patterns
in text. But other methods such as
latent semantic indexing and scatter-gather methods can also be used to develop
a model of the relevant classes and objects as indicators of concepts and
intentions being expressed in text.
Various
work product include the development of broad-term / narrow-term upper subject
matter taxonomy and back-of-the-book indexing for subject matter
retrieval. Automated taxonomy
generation for un-indexed document repositories was demonstrated in our work on
a FCC taxonomy in 2003. The FCC
taxonomy was developed using a topological construction defined on Orb graphs.
All
of these methods feed into the simple set theoretical constructions, having the
form of a set of syntagmatic units:
{
< a, r, b > }.
The
measurement of term occurrences is a first step but only a first step. There is an overriding principle. One uses a specific philosophy that
separates structure from function and allows the human to make judgments about function.
Measurement
by word level n-grams produces an ordered set
A = { ( w(1), w(2), . . . , w(j), . . . , w(n) ) }
If
n is an odd number, then w((n-1)/2) is the center of the n-gram, and two
branches of a special type of graph, a tree, can be rendered from this
n-gram.
For example during a word level n-gram
measurement process, the sentence with words:
a b c d e f g h
is output as a set of eight 5-grams
{ (-,-, a,b,c),
(-, a,b,c,d), (a,b,c,d,e), (b,c,d,e,f), (c,d,e,f,g), (d,e,f,g,h), (e,f,g,h,-),
(f,g,h,-, -) }
Each
of these 5-grams can be used to label a graph, for example the one in Figure 5.

Figure
5: The simple tree developed from a word level
5-gram
One can use the center word as a root
node and develop branches with the left part of the n-gram and the right part
of the n-gram. On the other hand, the
n-gram can be used to label a single “branch” as in Figure 6.
It is noted that variable length word
level n-grams and other methods also produce tree branches. For example, there are eight 3-grams for the
sentence with words:
a b c d e f g h
A word level 3 gram analysis is output
as:
{ (-, a,b),
(a,b,c), (b,c,d), (c,d,e), (d,e,f), (e,f,g), (f,g,h), (g,h,-,) }
and produces eight small trees.
More general graph constructions can be
built using n-grams.

Figure
6: A
branch developed from a word level 5-gram
In our system we will not use only
standard word level n-grams. A more
sophisticated means to produce the graph constructions are used as well. This means is called generalized n-grams and
may use a framework theory (as discussed in the next section). An additional rule engine can be presence
when the co-occurrence of significant words is being determined.
The general Framework (gF) notational
system produces a different type of data source than does text. gF Orbs are defined below.
The Zachman Framework is a
well-known business framework. Two
lesser-known examples of frameworks are the 12-primitive-element Sowa Framework
and 18-primitive-element Ballard Framework for knowledge base
construction. These Frameworks are
three of many that could be adopted.
The measurement output from an 18-element
framework has the form of a 19 tuple:
< a(0), a(1), a(2), . . . , a(18) >
where
the value of a(0) is set by a pre-process that categorizes the event that the
Framework will be used to characterize.
When any of these frameworks are used, one produces a n-tuple where each
element may have a class type and a value.
The type is derived from the semantic primitive’s definition. The user, or some other means, supplies the
value.
Suppose
that 100 events have been considered.
Domain space = { E(i) | i = 1, . . . , 100 }
A
prototype Framework Browser, designed in 2002 by OntologyStream Inc, stores the
cell values as strings, and inventories these strings into ASCII text. A key-less hash table management system is
used rather than a relational database.
The Browser elicits knowledge from the human clerk and then stores this
in a convenient way. The software is
operating system independent and occupies less that 200K of 32 bit computer
memory. The prototype builds gF Orbs
and stores this data as independent and editable ACSII files. As larger data sources are addressed, these
independent ASCII files exhibit the nature of fractal compression of data.
Suppose
that a parsing program produces a correlation analysis and results from this
analysis is encoded into a “derived” 5 tuple:
< a(0), a’(1), a’(2), a’(3), a’(4) >
where
a(0) is the event type and a’(1), . . . , a’(4) are each slot-fillers that
minimally sign the cell contents.
The
derivation process involves a reification of the slot-fillers in the context of
the framework, and this means that a theory of type may be developed for each slot
and a theory of relationship may be develop between various slots.
In
one version of a frame filing process, there is a reduction of a free form of
writing to a set of standard fillers for cells. Over time, the filling of cells is made from a pick list and the
pick list is maintained empirically. In
practice, we feel that a community based reconciliation processes is
necessary. There is always a potential
requirement to introduce new types of fillers at any moment. It is easy to imagine a type of “open logic”
governing the processes. When new
structure is encountered, a provision is made to adjust the underlying set of
atoms and compounds, as well as the rules over which event chemistries are
used.
The
set of fillers for each framework cell (a cell is called also a slot in script
theory in Schank’s theory (1977)) becomes the set of natural-kind that is
observed to be the structural components of the event under consideration. These structural components are the
substance of events, such as cyber, memetic or genetic expression and the
discoveries of relationships between structural elements are achieved using
categoricalAbstraction (cA) and eventChemistry (eC) interfaces.
The
similarity between gF Orbs and full text Orbs is straightforward. The slots’ functional dependencies are
rendered visually in the framework browsers.
In the full text Orbs the co-occurrence of terms are rendered.
A
predictive analysis methodology using cA/eC is also fulfilled in a nice
way. Predictive analysis methodology
supports what we call “mutual-induction”, where human and computer processes
are entangled in real time data processing.
To restate, we call this Human-centric Information Production (HIP).
Suppose
that 100 events have been considered.
Domain space = { E i | i = 1, . .
. , 100 }
In
each case, the framework has been filled out through:
·
·Interactive
knowledge elicitation involving human dialog and/or
·
·Some
artificial intelligence process that fills in anticipated cell values using a
theory of type related to each framework slot.
The domain space is now described by 500
individual data pieces
{ < a(0), a(1), a(2), . . . , a(4) >k | k = 1, . . . , 100 }