Monday, October 11, 2004
Manhattan Project to Integrate
Human-centric Information Production
Two PowerPoints on Generative
Methodology
for developing Substructural Ontology
This White Paper is for
anyone in our group to edit from into proposals or White Papers that uses
categoricalAbstraction, eventChemistry and generalFramework theory to propose
development and deployment of integrated technology from our group.
One Page PowerPoint “Quads” on
this White Paper
Architecture for Managing Incident Information
(AMII)
Using Orbs (Ontology referential bases)
White Paper, to be submitted to a DHS BAA
Principle
Investigator: Peter R. Stephenson
The Center for
Regional and National Security, Eastern Michigan University
peter.Stephenson
@ emich.edu
Co-Principle
Investigator: Paul S. Prueitt
Research Professor,
George Washington University
Paul @ ontologystream.com
Executive Summary. This
White Paper reaches into an emerging paradigm in information science founded on
a principled simplification of the application of computer science and on
well-established principles in cognitive and behavioral science. A group of scientists and technologists
propose a prototype. The prototype is
to be based on existing and deployed systems, as well as on a synthesis of certain
cutting edge innovations in data organization and encoding. Our prototype may provide a unified incident
command and decision support resource within the future National Incident
Management System. No increase in
hardware is envisioned. The management
system will see an increased use of data regularity in context, formal constructed
frameworks for organized and organizing patterns in information, and resilience
due to self-sustaining modularity.
Several layers of real time agile abstraction of information can be
shown to produce simple visual command, control and communication
interfaces. The key is abstraction that
occurs in real time and that simplifies the presentation of informational
structure while retaining nuance.
Any critical real time
information environment is likely to have inconsistent and incomplete
information. Within this environment,
the emergency responder has a need to acquire, synthesize, communicate and
share critical information. In
addition, incident commanders will coordinate responses into specific types of
pre-structured incident scenarios.
Experience informs us that information management systems having
pre-structured incident scenarios can be too brittle. Too strong a form of cognitive engineering will reduce
flexibility and miss key common sense insights that humans develop during the
incident. Various studies detail Intelligence Community (IC) transitioning
problems impacting all major software deployments. A strong from of cognitive engineering is often criticized
[1]. The fog of confusion hampers the
optimal flow of information, while individuals engage in a series of action
perception cycles immersed within an ebb and flow of information. Moving to the wrong scenario can lead to
irreversible damage.
After considerations are
made related to behavioral science; i.e., proper use of cognitive engineering,
the most important thing is information.
The most important information is the information that is being shared
in real time using natural language.
The improper use of information is always seen in hindsight as being the
source of all, or almost all, errors.
Interfaces are required
with legacy systems having information encoded into XML or OWL (Ontology Web
Language). Our prototypes are designed
to be open with standard API’s, use XML, be modular, scalable and evolutionary. Our data structures can be configured to be
compatible with low bandwidth wireless transmission. A simple interface with legacy databases will extract all
information and create Orbs [2] that are compatible with harvested situational
information from language expression. A
soft form of cognitive engineering will help identify situational aspects; like
how to help a person in a crisis, or how to help a person who maybe only
partially engaged because the crisis seems remote.
The National Incident
Management System has local, state and federal levels. Our prototype has scalability,
authentication and security features that allow flexible operation in each of
these levels and provide multiple secure peer-to-peer channels between any
first responder or incident commander and any other first responder or incident
commander.
1
Technical Approach. We suggest a
revolutionary capability. Data is
organized into small compact informational structures, in a way very similar to
data compression methods. Rather then
being merely focused on compression, these structures are created to recognize
data invariance and produce higher orders of abstraction based on co-occurrence
patterns. In our text understand
structures, the structures encode categories of word occurrences developed by
software algorithms and human inspections.
Text is harvested for patterns of categories of words expressed in
messages. Categories are treated as
abstract elements. Symbols or icons can
be produced and used like terms in a natural language. The categorical
abstractions, and a human-in-the-loop human reification process, produces a
graphical representation of category [3] co-occurrence patterns modified by
data mining and data synthesis processes.
The prototype is human-centric and yet relies on the computer,
electromagnetic spectrum, or telephone for transmission of data. The prototype is platform independent.
An iterative process, between human inspection
and data mining process output, produce robust and situational taxonomy and
machine ontology with classical logical inferences, similar and compatible with
OWL ontologies. The Orb “structured”
ontology, however has a four or five level abstraction hierarchy. Unlike classical Artificial Intelligence,
the patterns of category occurrence are converted into an icon representation
whose interpretation depends on human cognitive and anticipatory
responses. This dependency is seen by
users to be very natural. The structured
ontology is grounded in an abstraction framework developed through an enumeration
of aspects that scan the core invariances in a data stream. At the lowest level this enumeration is of
the patterns of code or message parts, measured with what is technically a bit
level n-gram window with variable size and a rule set for the filling the
contents of the window. The technical
details here are shared by a number of innovators who have discussed these
details. Part of the uniqueness of our
proposal is that these individuals have formed an informal collaboration on the
issue of generative methodologies for the formation of structured
ontologies. Perhaps most interesting is
that the structured ontology can arise completely from real time input of
patterns measured in human expression and from the measurement of instruments
and devices.
A.
Innovations
The
capability we are prototyping has four essential innovations;
1. Produces tunable
aggregation into several layers of abstraction of themes and concepts expressed
as graphical subject-matter indicators,
2. Produces tunable aggregation
into several layers of abstraction of threat and attack events in cyber space.
3. At each layer of
abstraction, use tools that enables the manipulation of graph structures to
produce event models and to project model consequences into possible outcomes,
4. At any layer of
abstraction, creates a projection of graph structures into a well written
textual report, with footnotes and a drill-down capability.
Each of these innovations has been separately
developed. All four are well understood
by the core team. Two of the four,
Acappella Software and Readware, are commercially productized. The Mark 3 knowledge base and the Orb data
encoding are long-term research projects of Drs Richard Ballard and Paul
Prueitt. The core team has knowledge of
the CoreTalk and CoreSystem innovation in structural encoding of data
regularity and icon programming language, developed by innovator Sandy
Klausner. The PI is the primary
innovator in the area of cyber attack and vulnerabilities taxonomy. In our prototype, the extension of his work
is in the direction of measuring the overall health of a support information
infrastructure during the duration of a critical incident.
The prototype has four types of input and four
types of outputs.
1. Natural language being
communicated in real time between individuals
2. Information on
communication infrastructure, including geographic and equipment status data.
3. Elements of a tri-level
ontology management capability expressing a model of the evolution of abstract
models of events at several time scales.
4. Auxiliary data recourses
in XML.
The automation requires pre-existing artifacts
developed by human experts. Natural
language communication will feed into structured ontology at a (1)
substructural, (2) event, (3) environmental levels. These artifacts are designed, and staged, to be available when
incidents occur. They are however
flexible due to the ease in which the layers of abstraction build structure
based in current inputs. Considerable
effort, however, is expended to develop these artifacts, which template
abstract elements occurring within the context of specific incident
models.
Some artifacts have the form of question
templates whose answers are either provided directly by a human or through a
process that produces an Orb construction.
Artifacts and ontologies are modular and atomic, thus providing a potential
range of pre-established services to any real time contingency. The ontologies are viewable as graphical
constructions with navigation and drill down and drill up capabilities. Services are made in an agile fashion
subject to human interpretation and manipulation in real time.
The written reports will use the existing and
deployed Acappella Software innovation.
Structured ontologies are subsetted and these subsets input into
artifacts developed using Acappella.
Using these artifacts Acappella produces readable reports on the
incident, and creates real time written assessments of sub events. A discussion
about the Acappella Software innovation can be demonstrated by contacting them
directly on the web.
B. Tunable concept and
theme aggregation
The prototype has several different standard
taxonomies and these are broken down into configurable modules linked to an
underlying structured ontology framework.
Metrics delineate a level of coherence present within messages when
interpreted with elements from this framework.
Concept and topic interpretations can be tuned to a specific type of
information space. From the harvesting
of discussions, a series of messages are grouped in discourse units. Messages and discourse units are parsed to
produce structured ontologies encoded as Orb structures. Once data sources are expressed as Orb
structures one can aggregate subset into reasoning via abstraction to produce
suggestive reasoning, and create visualizations for incident commanders and
first responders. These human-centric
information production (HIP) functions can be accomplished with low bandwidth
transmissions, memory requirements, and processor requirements.
C. Manipulation of Orb
constructions
The prototyping could demonstrate a demystified
information science complete with mathematical foundations, efficient and
scalable data encoding standards and a firm grounding in cognitive and behavioral
science. Orbs allow non-computer
scientists to manipulation of subject-matter indicators in a concept
representational space that is independent of natural language. For text, there are two primary candidates
(1) Ballard’s Mark 3 structures or (2) Adi’s Readware framework
structures. For cyber attack space we
have the Stephenson Cyber Attack Taxonomy.
The manipulation of constructions can be
automated, following work by Sowa, with similarity analysis of small graph
representations of co-occurrence of regular patterns. Structured similarity analysis is made on links and nodes and is
elevated to statements about similarity and types of similarity between small
cognitive graph constructions.
A formal mathematical theory exists. Part of the theory is called “differential
ontology” because of a formal correspondence between continuum mathematics and
discrete mathematics. Differential
ontology framework provides a potential homology between specific continuum
mathematical models of linguistic variation / basins of attraction and Orb
constructions. Again, what is
surprising is that all of these capabilities are possible with very small
bandwidth and any processor.
Long term issues: Orb based knowledge
processors might soon come to parallel neural and immune models of memory,
awareness, selective attention and anticipation. This possibility indicates a
long-term future for Orb technologies.
The theory regarding language independent representation of pure
information is the subject of a discussion between the team members and those
scientists and innovators with whom we have the freedom to discuss. There are a number of open questions, and
the work is controversial in nature.
D.
Projection of Orb constructions as well written reports
Our architecture supports many practical
capabilities, one of these is automatic report writing, using the Acappella
Software and a designed interface between Orb-encoded structured ontology. As already mentioned, it is possible to
instrument a projection of subject matter indicators into a situationally
specific generated product produced by Acappella Software. Prueitt served as science advisor to
Acappella Software Inc in 2002 and understands the narrative generation
techniques very well. Specific
information from aggregated Orb representation can to used to trigger answers
to poll-type questions developed by human experts conditional to anticipated
scenarios. The Acappella technology
follows from its narrative generation patent (2003) that produces an
interpretive natural language report that can further include footnotes to
specific text sections from the source data.
Narrative generation by the Acappella technology can be demonstrated -
the key is in having a poll type informational source. This type of informational source is
produced by the Orb constructions.
2 Personnel and Performer Qualifications and Experience. The
development team includes several scientists, practitioners and researchers of
long (average 20 years plus) experience and successful development of the
technologies discussed in this white paper.
Over a decade of research by Prueitt, et al, suggest strongly that the
technologies described will have a significant impact upon Unified Incident
Command and Decision Support systems at the local, state and national
levels. Several of the techniques
developed by the participants have been commercialized successfully and
currently are in general use.
3 Costs, Work and
Schedule
We anticipate the project will have a one-year
span for Phase One and an additional one-year span for Phase Two. The general output of Phase One is a proof
of concept prototype that, if accepted, will be developed into a production
system and deployed in Phase Two. The
anticipated top-level project plan is:
PHASE ONE
§ Quarter 1:
Specific requirements research and plan development. During this quarter a series of workshops
will be held to determine how, specifically, to apply the proposed technologies
to the requirements of individual incident command and support
communities. Proposals will already
have advanced benchmarking and feasibility tests. We are developing collaboration with State of Michigan
departments responsible for the human factors interface to first responders.
§ Quarter 2: The proposed
technologies will be tailored to the individual needs of the individual
incident command communities and first responder communities identified in
Quarter 1.
§ Quarter 3: A
series of workshops will be held with the same groups identified in Quarter 1
and the prototype implementations of the proposed technologies will be
demonstrated and critiqued. Real world
simulations will be offered and training requirements studied.
§ Quarter 4: The
prototype implementations of the proposed technologies will be updated to
reflect the results of the workshops in Quarter 3. Simulated scenario deployments with a small number of human
operators will be demonstrated to HSARPA.
A work plan for implementation and deployment of a production version of
the system will be prepared and submitted for approval.
PHASE TWO
§ Quarters 1 and 2: The
proposed system as approved in Phase One will be developed and deployed as a
production system. A proposed training
and rapid deployment plan will be developed.
Outcome metrics will be proposed.
§ Quarter 3: The
deployed system(s) will be tested in situ and any required debugging will be
performed. Training, deployment and
technology transition exercises will be performed and outcome metrics used to
evaluate critical factors.
§ Quarter 4: Final
system documentation and user training curriculum will be published. Full system production, acceptance and
deployment.
NOTES:
[1] The strong form of cognitive engineering
and artificial intelligence is part of what we simplify while stepping into a
natural science paradigm less conforming to the artificial intelligence
disciplines.
[2] Technical note on concept-covers:
The
Ontology referential base provides a simple means to encode, in small computer
memory allocations, natural language tokens, words or phrases; in co-occurrence
patterns with associations to a set of 2319 primitive concepts (inventoried by
the Readware software product). The
concepts have been developed so as to provide a “cover” over what might be
called the set of all possible concepts [4].
Underlying this concept cover is a “substructural ontology” framework
having two symmetries and a power set defined from three semantic primitives {element,
domain, order} [5]. The Readware
framework has a three dimensional matrix with 2, 2, and 8 [6] elements in the
dimensions, creating a type of periodic table.
The nature of this table, or to be more precise the nature of the origin
of language, is a subject of various works on semantic primitives by
individuals like C. S, Peirce, Dmitri Pospelov, John Sowa and Richard
Ballard. The value of the emerging
theory of language origins is that, in theory and as demonstrated in some
existing prototypes, the mapping of concepts expressed in social discourse
leads one to anticipate behaviors of individuals. The same system is used to help develop a global model of the
information in a specific incidence space and a structural model of how this
information is flowing within functional channels.
[3] We use the term “pattern” here in a
stochastic and categorical sense. The
language needed to talk about such patterns has been developed within the
papers written by Prueitt on categoricalAbstraction and eventChemistry.
[4] The cover is a real set of identified
concepts, but the set of all concepts is something that one cannot make a claim
to have enumerated. In standard
taxonomies, there are usually layers, the upper layer is a set of “broad terms”
and the second layer is a set of terms that are under the broad terms but are
narrower. A more precise notion of a
cover actually comes from mathematical topology.
[5] There are several descriptions of
primitives of this type. Perhaps one
can think about this in terms of sufficiency to make the concept cover
adequate. How primitives are derived
and how covers are derived is the essence of the information science we are
developing.
[6] 8 = 2^3. The power set over a set with n elements has 2^n elements.