Chapter 5
A General Framework for
Computational Intelligence
Re-edited Sunday, May 04,
2003
Section 1: Introducing a
Process Model for Human-mediated Information Production
The previous chapters
establish some preliminary notions that serve as a background from the natural
sciences to the computer sciences. In
this chapter we will work on the issue of representing knowledge as data structure
and then using this data structure as machine ontology for various
purposes.
First, let us look at
measurement, instrumentation, representation and encoding as a single
step. This step is often not examined
in detail by all of the stakeholders involved in the use of the information. So the systems developed in this way have
limitations. Why the measurement,
instrumentation, representation and encoding are not done well involves two
aspects. The first is cultural, and
often this means either institutional resistance or design practices by
companies who provide information tools.
The second aspect to why this is not done well has to do with both the
level of comprehension that we have about human knowledge experience.
The current level of development
of computer science is also a factor. Many have discussed the issue of whether
computer science had been developed based on a foundation that is not consistent
with natural science, in particular the life sciences. This issue of foundations is addressed in
the BCNGroup roadmap for semantic technology adoption [1]
How the data is encoded into
active memory or as a file (perhaps using just ASCII letters) is
important. If one has a standard
relational database, the instrumentation and measurement are really combined in
the form of source or input data. The
representation is occurring sometimes in the mind of a human who is formulating
input, or by a machine process that is acting according to some set of
rules.
The open source movement has
been attempting to improve the quality of information systems. But there are issues that control the
quality of measurement, instrumentation, representation and encoding. These issues are beyond what current computer
science has been. A resolution of these
issues is where we feel the foundations of knowledge science are to be
found.
If the data is coded poorly,
then a set of limitations arise that are due to the data encoding being
poor. We will look at the theme vector
representation of meaning in a moment. The
problem, as we will see, is not simply that the relational model is too strong
to be agile; but that the numerical model is used in a way in which it should
not be used.
XML is of great value
because it represents the localization of information as something that can be
given a name and properties. Resource
Description Framework (RDF) and Topic Maps make a further refinement in what
can be done using an underlying graph theoretical model. The nodes of the graph are the localization
of knowledge about something, the labels can be properties and other metadata,
and the links can be relationship variables that can occur (or are occurring)
between localizations. We will here
refer to a node as a location in order to reinforce the generality of the graph
model.
XML is a standard for
placing structured information into a text file. The text files of this sort can then be parsed so as to produce
something based on the information that is made accessable by the XML. But what is essential about how the
information can be placed into XML? The
answer has a lot to do with having an agreed upon standard for expressing
information is a highly structured way.
A tag for the information acts as a type of object, and in fact the
attributes of the tag can be used as if these attributes are internal data and
even internal procedures (or methods).
So the mental visualization of a subject (to use the Topic Maps term
from a “node”) is given a form that allows a specific evocation of human mental
experiences. Much like writing the
words of a natural language onto a piece of paper. Words are written into a structure that is loose and has a great
deal of variation. Community agreement
about the use of words and grammar is why the representation of information in
language is useful.
The meaning of the words comes
form the interpretation of information by a mind. This being said, however; one recognizes the great value that computer
based knowledge representation has if the computer can be used to move the
information around and produce secondary results such as the algorithmic clustering
together of words that are co-occurring in a text [2]
and the retrieval of information that is stored remotely.
Text mining and knowledge
representation has been trying to address this task so that knowledge
referenced by in natural language can be referenced in ways at were not
anticipated during the production of the natural language. This has been, so far, a hard problem;
largely because the cognitive aspects and the social aspects of knowledge
representation.
The knowledge representation
problem has been addressed using graph theoretical constructs and some other
methods, which we will discuss in this chapter. The notion of differential and formative ontological models is
developed as a means to tied together the graph theoretical model (of implicit
ontology) with a continuum mathematical model of the nearness and other types
of “relationships” that occur in language use, and in the modeling of structure
to function relationships. The concepts
of categoricalAbstraction (cA) and eventChemistry( EC) are also introduced, in
this chapter, to help formulation a general framework for computational
intelligence has is grounded in the remarks of the previous four chapters.
Detecting facts and events,
producing/using models, discovering relationships lead to the development of ontological
work product. The work product needs to
have an encoding that allows reuse in the various contexts as noted. Generating options in a decision support
environment, developing reporting mechanisms and auxiliary work product, and triggering
consequences.
But measurement,
instrumentation, representation and encoding we establish fidelity between contexts
and knowledge representation.
Section 2: Process Model,
measurement, instrumentation, representation and encoding
Let us start this discussion by looking closely at part of the
process model in Figure 1. We should
compare closely the type of theory that has been developed from the previous
chapters and think carefully about what instrumentation and measurement means
in this context. The issue of
representation and encoding are so tightly linked to understanding and
performance that we have to look at measurement in this light.
Figure 1: A process model
Our encoding strategy is to
develop syntagmatic representations in the form of
O = { < c1,
r, c2 > }.
that can exist in either
mono-level or tri-level knowledge structures. The difference being that in
tri-level structures the set O is regenerated in real time and made
subject to "pragmatics" at the point of decision, whereas in
mono-level structures the set O is a predefined resource supporting
decision making.
The different can be seen in
the notion of static verses dynamic. A
static ontology would have the form of O but it’s origins and
consequences might not be present. One
represents the ontological model as a reference to a real world. So the representational issue over static
verse dynamic is necessary.
How situatedness is to be
brought into the tri-level knowledge structure is still an open question;
however, we have thought about the use of the theory of singular perturbation
from dynamical systems, as one possibility.
Again, consider a set of “q”
coupled oscillators whose spin is described by the follow (simple) differential
equation.
dj/dt = w + SUM( c G(j)), i = 1, . . . , q
(with j the oscillation
phase, w the intrinsic (constant) oscillation, c the coupling and G any
non-linear function), having various types of network connections (architectures)
and initial conditions. The architecture would be expressed in coupling that
may be positive or negative. The coupling may also be variable and reflect
certain regular features of the circuit dynamics of metabolic reactions.
The global behaviors of the
system of systems are observed to partially or fully synchronize the relative
phase of individual oscillations.
We assume that the system
develops systemic and regular behavior that acts as a partial control over the
values that the coupling takes on. This control is from the higher level (of
organization), of the two levels, to the lower lever. A middle “level” emergences has the system defines itself and
separates itself from an environment.
Thus we use of the term “tri-level”.
Associative memories between theme space and semantic
space
(link to discussion on
generative methodology)
Section 4: Definitions and Theorems on the
decomposition of a SLIP data set
This is edited and extended from “Definitions and
Theorems on the decomposition of a SLIP data set: Summary of Results, July –
October 2001, SLIP Intrusion Detection Technology (74 pages).
The SLIP technology does a Fourier type decomposition of all invariance in the data set into clusters that have tight inter-cluster linkage and weak intra-cluster linkage. This invariance is encoded as the nodes of a tree (see Figure 2).
Figure 2: A trace route event that is
gathered into category D1
In Figure 2 we see the SLIP interface version 1.6
(October 17th, 2001). The
software is available from OntologyStream Inc.
By scrolling the top left window we would see that a RealSecure
Intrusion Detection System (IDS) classified each of the 24 elements of category
D1 as a Trace Route event. The only
computational means for gathering these elements together is the non-specific
relationship between Source IP plus Source Port and Target IP plus Target
Port.
In the notational paper [3]
we develop a notation for theme space analysis where each occurrence of type is
given its own dimension. The SLIP work has
a similar encoding process. We encode
the data into a Hilbert space, as before; but we start with the rows and
columns of the flat table.
In SLIP we start the differential ontology by
assuming a flat data exists with columns defining “type” and in each column we have
a specific (finite but open) sets of “values” for each type. One can take each cell of the flat table as
being defined uniquely by a column and a row.
Spreadsheets often regard the column and row as defining a cell. The cell can then be defined by the CCM
(Contiguous Connection Model) as ( type : value ) pairs.
SLIP works between the two types to identify the
simplest link analysis having a correlation between a value in one type and
distinct values in the second type. The
process in words is something like this:
“Select two types. Select
one type to be a relationship. Look at
each value in the first type and check to see if this value is ‘occurs with’ a
value in the second type. If this
happens more than once, then localize new information in the form
< ( type(2) : value(2) ),
( type(1) : value(1) ), (type(2) : value(2) ) >
The type information is
discarded because the information is localized around the correlation between
values. One may invert the type value
relationship and encode the values as numbers in a Hilbert line with the old
type information as a linked packet.
However, we are in the simplest case, and are considering only two types
at a time. The number of values however
can be large (or small.)
The ( type : value ) pairs
can be drawn from a relational database, from text files, or form native CCM
databases. The notion of occurrence can
be defined as appropriate in each case.
Occurrence is a measurement and instrumentation aspect of the AIPM.
In SLIP we encode new
information from existing information with the transformation
( a1 , b ) + ( a2 , b ) à < a1 , r,
a2 >
where a(1) and a(2) are the
shared values (of type two) for a single value (of type one). The type one value is then deemed to be a relationship
between the two type two values.
Section 5: Technical observation and theorems
On the question of non-crisp categories: The clusters themselves are gathered from the separate of high
level categories within the limiting distribution of the scatter gather process. The set of all of these clusters will have
pairs of clusters that have non-empty intersections. What lies outside of a core might be considered to be an
environment and may have significance.
However, we focus first on the cores of these categories.
Figure 3: Characterization of a core
Our method for automatically generating a framework
to start analysis was chosen to eliminate the environment. This process is an essential part of the process
of categorical abstraction. We were
looking for what stays the same as one moves from one event type to
another. The category is about that
sameness. The core is the center of a
category where this center is invariant across several limiting
distributions. The purpose of the SLIP
Framework is to make available to domain experts the invariance that is
produced by the non-specific relationship.
The set of ending nodes (sometimes called the leafs)
of the SLIP Tree produces a crisp partition of the original set of atoms. This is because children are always produced
in such a way as to provide this crisp partition with no overlap between the
memberships of the tree leafs. The
memberships of the tree leafs also will union to produce the complete
membership of the parent node in the SLIP Framework tree. This is the important notion of disjoint
union.
A series of theorems can be given. However, the observation now is that a
non-crisp partition can be developed in order to reflect what we conjecture
would be category entanglement (again look at the separation of context and
core seen in figure 2).
In more advanced implementation of the fundamental
SLIP theory, one can use feature analysis and a voting procedure to rout new
events into a type of non-crisp classification based on profiles of
categories. This introduces the notion
of a tri-level architecture for information routing, categorization and
information retrieval. These more
advanced implementations await the completion of the first fully stand alone
full function SLIP interface. The
reason why we mention the advanced implementation is that when the SLIP
Frameworks are being created and shared, [4]then
a number of intelligent programs will be possible.
Figure 5: The SLIP Interface taking
comments on an event category
The SLIP methods produce context free core category
memberships that appear in different environments. The method is a direct intersection of sets rather than a
statistical method.
Metadata can be associated with these nodes. This was done, in the 2002 prototype, using
a scripting language from the command line (see Figure 5). Comments made by analysts are appended to a
file pointed by a metadata tag within the node tag in XML.
On the fundamental SLIP theory: A set of definitions establishes an abstract
mathematical language. This language is
useful for two basic reasons:
1)
The
language allows one to conjecture about and prove properties that one can find
experimentally by developing algorithms and software
2)
The
language allows a peer review of the underlying intuitions about how the SLIP
technology might be used.
Section 6: Primary concepts:
Datamart: The SLIP
datamart consist of a table with two columns.
In the non-database version of the SLIP technology, an CCM repository is
used instead of a relational table. In
the ful text version an ACSII text file is the datamart.
The selection of the datamart is important. The following issues might have an impact on
what type of data is selected.
1)
A
time period may delimit an event or group of events.
2)
A
domain expert may analyze the nature of the data itself and produce an Analytic
Conjecture that relates two types.
In the relational model the types are columns. For example the first column value might be
the defensive addresses and the second column value might be the system calls
that occurred during the time interval in question. In a CCM repository, a “type” are represented by the set of all (
type : value ) pairs where the first part of pair has the same ASCII
string. Category theory can make an
equivalence relationship between the elements of a thesaurus ring. So in this case, the CCM type is defined by
an equivalence relationship. Similar
variation in type and value representation can be instrumented for full text
parsing.
It is important to realize that much of the value of
the SLIP technology will come about when domain experts develop data with
specific investigations in mind.
Non-specific relations: Pairs of first column values are
identified by parsing the Datamart and finding occurrences where a second
column value (b) appears in more than one record.
The critical issues are only
that each line in the Pairs text file represents a record and that the record
represents a single event.
Figure 5: The non-specific relationship between the atom a1 and
a2
Two values from the first
column are paired if the associated value in the second column is the
same. This situation is represented in
Figure 5.
Once this pairing is done,
then the pairs are used to specify atoms.
The atoms are those elements that are in one or more pairs.
Formally we have:
( a1
, b ) + ( a2 , b ) à < a1 , r,
a2 >
where r is the non-specific
relationship.
Pairs.dbf Pairs.dbf
(formerly called Two.dbf) is the data source that contains the pairs of
secondnames The table is denoted with a
script bold Q.
All of this computational process is formal. Not meaning is assigned until the domain
expert looks that the membership, using the membership to produce a report from
the original data source, and making human judgments about the nature of the
core category. These judgments are then
collected if the domain expert types into the comment property of the core
category.
The notions of nearness and the topological notion
of analytic mathematics is used to produce a retrieval of elements into a core
category. The sets of elements produced
in this way will be actually related in exactly the fashion defined by the
chaining occurring through the non-specific relationship.
Ratio atoms/secondname: This ratio is a computed value for any mart table. This ratio can be computed and used to tune
the import of SLIP data marts. For example,
consider the dataset that produces the sample SLIP Framework (see Figure
1). The mart columns were originally
selected to reflect scanning for Trojans and the follow-up use of a port
identified in Trojan scans. A quick
computation from RealSecure database reports can indicate if there is significant
Trojan scanning in this dataset. If
there is and one wishes to compute the full SLIP Framework, then this is
possible.
This ratio and others like it can be used to produce
a “selective attention” that automates some of the intelligence functions of
ID, particularly those involving either data visualization or link
analysis.
Distribution of A on the circle: The first use, that I know of, of the circle for scatter gather
was by myself in 1996. The technique
has not been publicly reported as yet.
Scatter gather generally requires both a pushing apart of atoms and a
pulling them together. However, due to
the one point compactification of the line interval, a manifold with no
boundary, a pulling together is sufficient to separate those groups that are
interlinked (called prime cores.) The
use of the collection Q (derived from link analysis) to gather
was invented (by Prueitt) in 2001, as far as I know.
The open question is whether or not we have a unique
factorization theorem as one does in number theory.
Conjecture 1 (October 17th, 2001): There exist a unique decomposition of a set
of atoms into primes using a purely algorithmic process. This unique decomposition will be produced
each time the SLIP algorithms are run, given that the algorithms find any
splitting subset S that exists in any prime core.
The SLIP algorithms and data structure form a new
type of information system. This system
is a non-traditional database similar to various non-relational ( i.e., third
normal form type relational databases) called Referential Information Bases
(RIBs). These RIB structures are being
developed as static in-memory structures by a number of groups. Query and data management features are
different in that the RIBs do NOT allow delete or append functions. These update functions are accomplished by
completely unloading and remapping data into a formal finite state
machine. This process is slow compared
to relational database updates.
However, once the remapping occurs the update is
complete and very fast data aggregation and emergent computing processes can
occur. The RIB technologies are being
developed as a new generation of data warehousing technologies, where append
and delete is managed in a batch process.
SLIP uses many of the concept that have been developed by others
involved in the RIB-type technologies.
The following is some preliminary analysis about how
the input data might be configured so as to make a specific search for events
of a particular type.
Figure 7: A chain relationship
between a(1) and a(n)
The events in Figure 3 have a specific nature if the
first and second columns are (1) attacker locations { x, a(1), a(2), a(3),
a(4), . . . , a(n) } and (2) defender location, { y }.
We have seen two different interpretations of a
chain relationship of this type given these two columns in the data mart. The first interpretation is about the
identification of a Trojan and the consequent use of this knowledge by the
attacker. The second interpretation is
about session hijacking.
So what is the different between these two types of
events?
The first interpretation was the motivation for the
example SLIP Framework that the demonstration (version 1.4) displays and
navigates. A port scan, from an source
a(i), is used to trigger a response from a Trojan existing at location y. Given a successful response from the scan,
it is assumed that a(i) communicates off line to x and that x then addresses
the same port at location y and establishes a session that in some way uses the
Trojan.
A different source IP , a(j), is used against a
different target IP – again denoted in Figure 3 as y. The identity of y is not used in the SLIP scatter gather and thus
the fact that the y may change location is not accounted for. We create a category Y, of y
locations and treat any member of the category without distinction.
y(i), y(j) is an element of Y à y(i) – y(j)
If in each case, a single IP is used once a Trojan
is identified, then we have established the chain relationship. The source IP locations { x, a(1), a(2),
a(3), a(4), . . . , a(n) } will form all or part of a prime core.
The second interpretation is quite a bit different,
and yet has almost the same characteristics when put into the SLIP
Framework. One form of session
hijacking occurs when an SYN is sent from x to y but x is spoofing its
location, using the location a(i).
Some considerable work needs to be done in order to
catch a reply from one of the spoofed locations.
Rout traces will sometimes work, if the
administrator has not turned off rout tracing.
Then given the ability of catch the reply, the source must manage to
guess the session id number. Here is where a vulnerability of the attacker
occurs. The ability to guess the
session id is completely dependent on there being very little time elapsed
between the SYN and the ACK reply. A
SLIP signature for this type of element could be the membership in E4 of the
first example SLIP Framework.
Figure 8: Five spoofed addresses
chained to a port 1080 attack
Motivation: The motivation for creating a batch of test sets and examining
the cluster cores, is to classify types of attacks from the patterns given in
chain relationships within the pair table.
These chain relationships are due to shallow link analysis. The clustering merely shows us where to
look. The halting condition for
clustering has been shown to be equivalent with a formal property about chain
relations. So one has a computational
foothold on the chains themselves. The
chaining in turn reveals real, but non-specific linkage between data elements
that exist in the data due to specific causes.
The notion of degeneracy
is used by Nobel Laureate, Gerald Edelman, to indicate a one-to-many to many-to-one
relationship that captures the flexibility that can be seen experimentally in
the subcellular protein circuits that support the acquisition of
specific real time connection patterns in brain regions in response to
experience (Edelman, 1987). The word "degeneracy", when used in this
way, points to stochastic theories of causation in which a probability
distribution spreads potential along a finite and discrete set of paths, from
one location - the present, into the future. This spread of paths from single
nodes is realized in standard Bayesian analysis using graph and tress. Each
node is characterized by a path leading from the past, or representation of the
past, to the node (the present). From this node we have the n paths leading
away into the future.
A representation of the
concept of “response degeneracy
Associated with each path is
a conditional probability. So on the abstract, the future is determined by a
random variable that expresses these conditional probabilities. The state
transition is degenerate in the same way that Edelman networks are, on the
surface. But the mechanisms that express degeneracy in biology are quite
different that the Bayesian inference engine.
The paths from each node, in
a Bayesian network, is to yet another node; and thus the number of nodes soon
explores. This requires a specific methodology to address this unrealistic
character of the network. We can treat the nodes as states followed by
gestures. The state is what is observed by an actor, something that can express
an gesture to any state. The gestures and states are a composition of different
substrate into an structure that is known via category policies.
We also see that an absence
of flexibility characterizes methods that rely on theme representations of text
for semantic content. Theme representations are generally products of
linguistic analysis and/or knowledge engineering. Often what are being sought
are the facts of the case; who said what and when. This surface is a one level
representation of a complex system. The picture given is; however, not the
complete picture of human inner thought or behavior. We immediately understand
that word phrases, by themselves can not capture the degenerate set of all
possible interpretations of the meaning of the author and the understanding of
various types of readers. Variable response potential is necessary and is
circumspect to subjective constraints. The question that the chapters of this
book puts to the reader is regarding the conjecture that subjective constraints
are possible in machine intelligence given that the intelligence is expressed
in the tri-level architecture.
We make the claim that
variable response is not possible, except through the development of a memory
structure, where the invariant substructures are encoded and made available for
real time remembering within the context established by variable category
policies.
As to be discussed further
in Chapter 10, the special Quasi Axiomatic Theories (QAT) were developed, in
Russia, by Finn and his colleagues (Finn, 1991) in order to address the need
for open logics. C. S. Peirce's foundational logics helped the Russian
establish not only an open logic but also a system that is stratified. The QAT
languages manage an assignment of meaningfulness during the aggregation of
substructure, in a step by step fashion that allows the strict separation
between logic atoms and evaluation functions. The assignment of truth-value is
made under the rules of algorithms that are specified in degenerate situational
logics. Moreover, the rules of deduction and the rules for assignment of
meaning are to be modified according to the open systems theory developed, also
in Russia, by Pospelov (1986). Thus specific interpretations, about a specific
input, can be modeled as a simple constraint on a larger class of
interpretations.
Response degeneracy is also
constrained by the state of the environment, whether the metabolic environment
or some interpretive environment such as mental events. In metabolic
environments, a certain type of circuit dynamic exists where one state leads to
another and environmental populations of reactant support each state. By in a
degenerate case, the dynamic is uncompleted without additional constraint. As
Edelman's work illustrates, these circuits exist as protein conformational
state changes, and in metabolic reactions, in the immune and neural systems.
The expression is governed ultimately by an image of self that is the complex
expression of the whole system. In interpretive environments, we need a similar
notion. This notion is the notion of a "system" image.
In order for a text
understanding system to have a feature that is similar, to response degeneracy,
we need situational logic. We also need syntagmatic representations of the form
<a, r, b> where a and b are locations in a semantic net and r
is a class of relationships that can reasonably exist between two concepts.
Proper methods for
evolutionary linking of elementary syntagmatic units, into situational models,
can be chosen after we handle the difficult software issues regarding theme
representation and visualization.
Visualization tools for
semantic spaces have a natural similarity to tools for visualizing chemical
graphs. Scientific visualization has concentrated for two decades on
visualization of knowledge about chemical graphs, and thus the technology and
the methodology is widely used and understood. The cognitive graphs are more
complex in certain ways, perhaps by an order of magnitude, but in other ways
the chemical graph and the cognitive graph are exactly the same.
We expect that in the near
future, that specific semantic net structures can be delivered to the human
client via visualization tools designed for chemical graphs. The delivery to
the user can be in the form of a text composed by automated means.
[2] See SLIP technology index at : http://www.ontologystream.com/cA/index.htm
[3] See notational paper at : http://www.bcngroup.org/area2/KSF/Notation/notation.htm
[4] See the BCNGroup roadmap for adopting semantic technology at : http://www.bcngroup.org/area1/2005beads/GIF/RoadMap.htm