graph representation of knowledge

Monday, October 11, 2004

Manhattan Project to Integrate

Human-centric Information Production

Two PowerPoints on Generative Methodology

for developing Substructural Ontology

This White Paper is for anyone in our group to edit from into proposals or White Papers that uses categoricalAbstraction, eventChemistry and generalFramework theory to propose development and deployment of integrated technology from our group.

One Page PowerPoint “Quads” on this White Paper

Architecture for Managing Incident Information (AMII)

Using Orbs (Ontology referential bases)

White Paper, to be submitted to a DHS BAA

Principle Investigator: Peter R. Stephenson

The Center for Regional and National Security, Eastern Michigan University

peter.Stephenson @ emich.edu

Co-Principle Investigator: Paul S. Prueitt

Research Professor, George Washington University

Paul @ ontologystream.com

Executive Summary. This White Paper reaches into an emerging paradigm in information science founded on a principled simplification of the application of computer science and on well-established principles in cognitive and behavioral science. A group of scientists and technologists propose a prototype. The prototype is to be based on existing and deployed systems, as well as on a synthesis of certain cutting edge innovations in data organization and encoding. Our prototype may provide a unified incident command and decision support resource within the future National Incident Management System. No increase in hardware is envisioned. The management system will see an increased use of data regularity in context, formal constructed frameworks for organized and organizing patterns in information, and resilience due to self-sustaining modularity. Several layers of real time agile abstraction of information can be shown to produce simple visual command, control and communication interfaces. The key is abstraction that occurs in real time and that simplifies the presentation of informational structure while retaining nuance.

Any critical real time information environment is likely to have inconsistent and incomplete information. Within this environment, the emergency responder has a need to acquire, synthesize, communicate and share critical information. In addition, incident commanders will coordinate responses into specific types of pre-structured incident scenarios. Experience informs us that information management systems having pre-structured incident scenarios can be too brittle. Too strong a form of cognitive engineering will reduce flexibility and miss key common sense insights that humans develop during the incident. Various studies detail Intelligence Community (IC) transitioning problems impacting all major software deployments. A strong from of cognitive engineering is often criticized [1]. The fog of confusion hampers the optimal flow of information, while individuals engage in a series of action perception cycles immersed within an ebb and flow of information. Moving to the wrong scenario can lead to irreversible damage.

After considerations are made related to behavioral science; i.e., proper use of cognitive engineering, the most important thing is information. The most important information is the information that is being shared in real time using natural language. The improper use of information is always seen in hindsight as being the source of all, or almost all, errors.

Interfaces are required with legacy systems having information encoded into XML or OWL (Ontology Web Language). Our prototypes are designed to be open with standard API’s, use XML, be modular, scalable and evolutionary. Our data structures can be configured to be compatible with low bandwidth wireless transmission. A simple interface with legacy databases will extract all information and create Orbs [2] that are compatible with harvested situational information from language expression. A soft form of cognitive engineering will help identify situational aspects; like how to help a person in a crisis, or how to help a person who maybe only partially engaged because the crisis seems remote.

The National Incident Management System has local, state and federal levels. Our prototype has scalability, authentication and security features that allow flexible operation in each of these levels and provide multiple secure peer-to-peer channels between any first responder or incident commander and any other first responder or incident commander.

1 Technical Approach. We suggest a revolutionary capability. Data is organized into small compact informational structures, in a way very similar to data compression methods. Rather then being merely focused on compression, these structures are created to recognize data invariance and produce higher orders of abstraction based on co-occurrence patterns. In our text understand structures, the structures encode categories of word occurrences developed by software algorithms and human inspections. Text is harvested for patterns of categories of words expressed in messages. Categories are treated as abstract elements. Symbols or icons can be produced and used like terms in a natural language. The categorical abstractions, and a human-in-the-loop human reification process, produces a graphical representation of category [3] co-occurrence patterns modified by data mining and data synthesis processes. The prototype is human-centric and yet relies on the computer, electromagnetic spectrum, or telephone for transmission of data. The prototype is platform independent.

An iterative process, between human inspection and data mining process output, produce robust and situational taxonomy and machine ontology with classical logical inferences, similar and compatible with OWL ontologies. The Orb “structured” ontology, however has a four or five level abstraction hierarchy. Unlike classical Artificial Intelligence, the patterns of category occurrence are converted into an icon representation whose interpretation depends on human cognitive and anticipatory responses. This dependency is seen by users to be very natural. The structured ontology is grounded in an abstraction framework developed through an enumeration of aspects that scan the core invariances in a data stream. At the lowest level this enumeration is of the patterns of code or message parts, measured with what is technically a bit level n-gram window with variable size and a rule set for the filling the contents of the window. The technical details here are shared by a number of innovators who have discussed these details. Part of the uniqueness of our proposal is that these individuals have formed an informal collaboration on the issue of generative methodologies for the formation of structured ontologies. Perhaps most interesting is that the structured ontology can arise completely from real time input of patterns measured in human expression and from the measurement of instruments and devices.

A. Innovations

The capability we are prototyping has four essential innovations;

1. Produces tunable aggregation into several layers of abstraction of themes and concepts expressed as graphical subject-matter indicators,

2. Produces tunable aggregation into several layers of abstraction of threat and attack events in cyber space.

3. At each layer of abstraction, use tools that enables the manipulation of graph structures to produce event models and to project model consequences into possible outcomes,

4. At any layer of abstraction, creates a projection of graph structures into a well written textual report, with footnotes and a drill-down capability.

Each of these innovations has been separately developed. All four are well understood by the core team. Two of the four, Acappella Software and Readware, are commercially productized. The Mark 3 knowledge base and the Orb data encoding are long-term research projects of Drs Richard Ballard and Paul Prueitt. The core team has knowledge of the CoreTalk and CoreSystem innovation in structural encoding of data regularity and icon programming language, developed by innovator Sandy Klausner. The PI is the primary innovator in the area of cyber attack and vulnerabilities taxonomy. In our prototype, the extension of his work is in the direction of measuring the overall health of a support information infrastructure during the duration of a critical incident.

The prototype has four types of input and four types of outputs.

1. Natural language being communicated in real time between individuals

2. Information on communication infrastructure, including geographic and equipment status data.

3. Elements of a tri-level ontology management capability expressing a model of the evolution of abstract models of events at several time scales.

4. Auxiliary data recourses in XML.

The automation requires pre-existing artifacts developed by human experts. Natural language communication will feed into structured ontology at a (1) substructural, (2) event, (3) environmental levels. These artifacts are designed, and staged, to be available when incidents occur. They are however flexible due to the ease in which the layers of abstraction build structure based in current inputs. Considerable effort, however, is expended to develop these artifacts, which template abstract elements occurring within the context of specific incident models.

Some artifacts have the form of question templates whose answers are either provided directly by a human or through a process that produces an Orb construction. Artifacts and ontologies are modular and atomic, thus providing a potential range of pre-established services to any real time contingency. The ontologies are viewable as graphical constructions with navigation and drill down and drill up capabilities. Services are made in an agile fashion subject to human interpretation and manipulation in real time.

The written reports will use the existing and deployed Acappella Software innovation. Structured ontologies are subsetted and these subsets input into artifacts developed using Acappella. Using these artifacts Acappella produces readable reports on the incident, and creates real time written assessments of sub events. A discussion about the Acappella Software innovation can be demonstrated by contacting them directly on the web.

B. Tunable concept and theme aggregation

The prototype has several different standard taxonomies and these are broken down into configurable modules linked to an underlying structured ontology framework. Metrics delineate a level of coherence present within messages when interpreted with elements from this framework. Concept and topic interpretations can be tuned to a specific type of information space. From the harvesting of discussions, a series of messages are grouped in discourse units. Messages and discourse units are parsed to produce structured ontologies encoded as Orb structures. Once data sources are expressed as Orb structures one can aggregate subset into reasoning via abstraction to produce suggestive reasoning, and create visualizations for incident commanders and first responders. These human-centric information production (HIP) functions can be accomplished with low bandwidth transmissions, memory requirements, and processor requirements.

C. Manipulation of Orb constructions

The prototyping could demonstrate a demystified information science complete with mathematical foundations, efficient and scalable data encoding standards and a firm grounding in cognitive and behavioral science. Orbs allow non-computer scientists to manipulation of subject-matter indicators in a concept representational space that is independent of natural language. For text, there are two primary candidates (1) Ballard’s Mark 3 structures or (2) Adi’s Readware framework structures. For cyber attack space we have the Stephenson Cyber Attack Taxonomy.

The manipulation of constructions can be automated, following work by Sowa, with similarity analysis of small graph representations of co-occurrence of regular patterns. Structured similarity analysis is made on links and nodes and is elevated to statements about similarity and types of similarity between small cognitive graph constructions.

A formal mathematical theory exists. Part of the theory is called “differential ontology” because of a formal correspondence between continuum mathematics and discrete mathematics. Differential ontology framework provides a potential homology between specific continuum mathematical models of linguistic variation / basins of attraction and Orb constructions. Again, what is surprising is that all of these capabilities are possible with very small bandwidth and any processor.

Long term issues: Orb based knowledge processors might soon come to parallel neural and immune models of memory, awareness, selective attention and anticipation. This possibility indicates a long-term future for Orb technologies. The theory regarding language independent representation of pure information is the subject of a discussion between the team members and those scientists and innovators with whom we have the freedom to discuss. There are a number of open questions, and the work is controversial in nature.

D. Projection of Orb constructions as well written reports

Our architecture supports many practical capabilities, one of these is automatic report writing, using the Acappella Software and a designed interface between Orb-encoded structured ontology. As already mentioned, it is possible to instrument a projection of subject matter indicators into a situationally specific generated product produced by Acappella Software. Prueitt served as science advisor to Acappella Software Inc in 2002 and understands the narrative generation techniques very well. Specific information from aggregated Orb representation can to used to trigger answers to poll-type questions developed by human experts conditional to anticipated scenarios. The Acappella technology follows from its narrative generation patent (2003) that produces an interpretive natural language report that can further include footnotes to specific text sections from the source data. Narrative generation by the Acappella technology can be demonstrated - the key is in having a poll type informational source. This type of informational source is produced by the Orb constructions.

2 Personnel and Performer Qualifications and Experience. The development team includes several scientists, practitioners and researchers of long (average 20 years plus) experience and successful development of the technologies discussed in this white paper. Over a decade of research by Prueitt, et al, suggest strongly that the technologies described will have a significant impact upon Unified Incident Command and Decision Support systems at the local, state and national levels. Several of the techniques developed by the participants have been commercialized successfully and currently are in general use.

3 Costs, Work and Schedule

We anticipate the project will have a one-year span for Phase One and an additional one-year span for Phase Two. The general output of Phase One is a proof of concept prototype that, if accepted, will be developed into a production system and deployed in Phase Two. The anticipated top-level project plan is:

PHASE ONE

§ Quarter 1: Specific requirements research and plan development. During this quarter a series of workshops will be held to determine how, specifically, to apply the proposed technologies to the requirements of individual incident command and support communities. Proposals will already have advanced benchmarking and feasibility tests. We are developing collaboration with State of Michigan departments responsible for the human factors interface to first responders.

§ Quarter 2: The proposed technologies will be tailored to the individual needs of the individual incident command communities and first responder communities identified in Quarter 1.

§ Quarter 3: A series of workshops will be held with the same groups identified in Quarter 1 and the prototype implementations of the proposed technologies will be demonstrated and critiqued. Real world simulations will be offered and training requirements studied.

§ Quarter 4: The prototype implementations of the proposed technologies will be updated to reflect the results of the workshops in Quarter 3. Simulated scenario deployments with a small number of human operators will be demonstrated to HSARPA. A work plan for implementation and deployment of a production version of the system will be prepared and submitted for approval.

PHASE TWO

§ Quarters 1 and 2: The proposed system as approved in Phase One will be developed and deployed as a production system. A proposed training and rapid deployment plan will be developed. Outcome metrics will be proposed.

§ Quarter 3: The deployed system(s) will be tested in situ and any required debugging will be performed. Training, deployment and technology transition exercises will be performed and outcome metrics used to evaluate critical factors.

§ Quarter 4: Final system documentation and user training curriculum will be published. Full system production, acceptance and deployment.

NOTES:

[1] The strong form of cognitive engineering and artificial intelligence is part of what we simplify while stepping into a natural science paradigm less conforming to the artificial intelligence disciplines.

[2] Technical note on concept-covers:

The Ontology referential base provides a simple means to encode, in small computer memory allocations, natural language tokens, words or phrases; in co-occurrence patterns with associations to a set of 2319 primitive concepts (inventoried by the Readware software product). The concepts have been developed so as to provide a “cover” over what might be called the set of all possible concepts [4]. Underlying this concept cover is a “substructural ontology” framework having two symmetries and a power set defined from three semantic primitives {element, domain, order} [5]. The Readware framework has a three dimensional matrix with 2, 2, and 8 [6] elements in the dimensions, creating a type of periodic table. The nature of this table, or to be more precise the nature of the origin of language, is a subject of various works on semantic primitives by individuals like C. S, Peirce, Dmitri Pospelov, John Sowa and Richard Ballard. The value of the emerging theory of language origins is that, in theory and as demonstrated in some existing prototypes, the mapping of concepts expressed in social discourse leads one to anticipate behaviors of individuals. The same system is used to help develop a global model of the information in a specific incidence space and a structural model of how this information is flowing within functional channels.

[3] We use the term “pattern” here in a stochastic and categorical sense. The language needed to talk about such patterns has been developed within the papers written by Prueitt on categoricalAbstraction and eventChemistry.

[4] The cover is a real set of identified concepts, but the set of all concepts is something that one cannot make a claim to have enumerated. In standard taxonomies, there are usually layers, the upper layer is a set of “broad terms” and the second layer is a set of terms that are under the broad terms but are narrower. A more precise notion of a cover actually comes from mathematical topology.

[5] There are several descriptions of primitives of this type. Perhaps one can think about this in terms of sufficiency to make the concept cover adequate. How primitives are derived and how covers are derived is the essence of the information science we are developing.

[6] 8 = 2^3. The power set over a set with n elements has 2^n elements.