Global Information Framework and
Knowledge
Management
Revised slightly
April 16, 2006
With foot notes
A prototype
Public Document
Updated Friday, July 15, 2005, Version 9.8
Point of Contact:
Dr Paul S Prueitt, psp @ ontologystream.com
Behavioral
Computational Neuroscience Group
Development Committee
Global Information Framework and
Knowledge
Management
A prototype
Position Paper
Behavioral
Computational Neuroscience Group
Development Committee
Why a roadmap is needed for semantic technology adoption 4
Section 1: Proof of concept 10
Section 2: Context and objectives 12
Section 3: Ontology architecture 15
Section 4: The ontology encoding innovation 20
Section 5: Informational convolution 26
Section 6: The minimal deployment 29
Section 7: Regularity in report generation 30
Section 8: Predictive Analysis Methodology 33
Section 9: A future anticipatory
technology 40
Section
10: The
Advisory Committee and Companies 48
Appendix A: Statement of Purpose 49
Appendix B: Project Outline 50
Appendix
C: Semantic Science 51
Appendix
D: Knowledge Sharing Foundation
Core 53
Keywords 54
In 2005 everyone knows what a horse and buggy is
and what an automobile is. Each person
in our society knows the story of the emergence of the automobile manufacturing
business sector and the American love affair with the automobile.
We do not know to what extent meaning might be
captured in a “semantic web[1]”.
We have not experienced anything that informs us about what semantic technology
might become. Relevant issues in linguistics, social theory and the nature of
science are not well known in departments of computer science. Most of computer science is shaped by
engineering theory and scientific reductionism.
A roadmap is needed. One is provided here by a group of natural scientists and
mathematicians.
We propose a broad program to establish the intellectual
and technology foundation to a science of knowledge systems, and to integrate
and deliver to the market place information science based on the proper use of
what is called “machine encoded ontology”.
The first delivery of ontology-based technology
can be accomplished within a few months, given our previous work and the
existing technology components.
BCNGroup scientists recommend a demonstration
program that has three parts organized in two phases.
Phase 1
1) Technology integration
Phase 2:
2) Advanced knowledge
management certification
3) Ontology development.
Our proposal addresses these parts one at a time,
the first part is proposed at a cost of $750,000 over period of six
months.
The first part depends on our (already) having
completed a principled selection of advanced knowledge management systems,
semantic extraction systems and data persistence systems.
In each case patents protect the underlying
technology and allow a science advisory board to develop an in-depth description
of each technology component. We have
developed curriculum that exposits the philosophical principles on which their
software user interfaces are dependant.
This curriculum is being readied for delivery as knowledge management
certification and as text books designed for university curriculum.
Beyond the first deployment, the concept of a
knowledge sharing foundation [2]
is proposed, and is being readied as a “Red Hat” type business model. This is not, however, designed as a business. Rather the knowledge sharing foundation is
designed as a cultural institution directed to found the knowledge sciences and
to develop curriculum that helps average Americans and individuals all over the
world.
A human-centric
information production [3]
capability is defined and existing, commercial, software identified. A distributed information system is
specified that will enable the real time representation and sharing of human
knowledge about situations. A global
information framework is used as a human control interface over complex
ontology.
Example:
Aircraft landing at a specific airport will express behavioral patterns. An airport ontology and aircraft landing
ontology is used to provide an interpretation of the behavioral patterns
expressed in each landing. Over time,
the observed behavioral patterns lead to early diagnosis of risks. A human looks at the patterns and makes
judgments based on personally held tacit knowledge. New concepts about the patterns are encoded as meta-data. The patterns themselves are encoded as new
behavioral ontology reflecting the history of observations about aircraft
landing at a specific airport.
Inputs come
from any reporting-software-system. An
example is US Customs and Border Protection reports on search and targeting
operations or administrative rulings on tariff codes. Inputs can be developed from any event reporting mechanism,
whether written reports or reports that involve the manual development or
modification of ontology.
Outputs
include a computable and visualizable historical record about situations
reported. The record is expressed as
ontology and human visualization of this record is provided.
Visualization of
graphical structure requires human perception to evoke an experience of
knowledge about a specific situation, or an event space. Graph labels suggest meaning in much the
same way as sentences suggest meaning when humans compose sentences.

Figure 2: Visualization of concept
indicators in a collection of fables
Language dependence is not fully achieved, for reasons that have to do with differences between
natural languages. Each natural language has language dependant
characteristics. In principle, our
technology establishes a foundation for using ontological models having
correspondences to sets of concept representations.
Ontological models are
envisioned as being a type of “interlingua”, not of words composed in grammar,
but as systems of signs that are interpreted by humans in various natural
language settings.
Ontological models
provide metadata that indicate where possible misunderstanding might occur,
thus ontological models provide a complex formalism to help in the translation
and transcription of meaning from one human language to another. Like mathematical models, ontologies are
useful as enablers of computations based on the structure of the defined sets
of concepts. In mathematic the concepts
are those related to field dynamics and to the conservation laws of
physics. The ontological models we are
developing are about more complex subjects, such as the intentions of a
determined enemy to bring elements of bioterrorism into the
Ontology extends
Hilbert mathematics from
deterministic systems to complex systems having uncertainty and
under-determined constraints.
Representing real advances in objective science, ontological models
provide a computational basis for the real time projection of human knowledge
within communities of practice. Medical
informatics and bioinformatics are demonstrating value in two early
utilizations of ontological modeling. A
construction over conceptual representations of topics in social discourse has
been applied to medical literatures.
These constructions bring relevant information to medical research
communities. Bio-event defense
architecture has been outlined and proposed by members of the BCNGroup. Application in pharmaceutical research and
development can be identified under non-disclosure agreements. These applications involve conceptual
modeling as well as direct modeling of the ontology involved in gene
expression.
Subject matter
indicators are represented in computable ontology constructions, including the
most simple, and yet most powerful, one based on a concept label and one to n
(undetermined integer) word, word stem or word phrases. These concept labels are formally represented
in several knowledge base technologies as an n-tuple.
< a0, a1, . . . an
>
Our proposed
deployment, takes these technologies and integrates them. We also add a data mining process where fast
convolution transforms have both a mathematical formulation as well as
operational realizations of these transforms as precise data retrieval methods.
Convolution operators
are fast computed over the elements of a set of concept representations. The
convolution operator results in the separation of context and merging of
ontology. One pass over an in-memory
data structure is sufficient.

Figure 1: Co-occurrence patterns with
“hash”
From these
technologies, sets of concept representations are developed and made accessible
to search and analysis. Several
semantic extraction tools are integrated with a commercially available RDF
repository [4]. The output of these semantic extraction
tools is a set of subject matter indicators, represented as RDF
statements. The basic set operations
are available as formalism and thus the standardization of these constructions
are a matter of public record.
It is vital to
recognize that the subject matter indicators are given final interpretations by
knowledgeable humans.
Our
Phase 1 proposal is to integrate five stand-alone COTS[5]
products; two semantic extraction systems, two knowledge management systems,
and a taxonomy acquisition system, with an Open Source ontology editing tool
(Protégé), an Open Source document repository system, and a simple RDF
(Resource Description Framework) data repository.
Knowledge management systems address the need to
enhance the quality of reporting as well as to make managed vocabularies
available as an interface between ontology constructions and normal human
use. Semantic extraction systems
convert free form text into structured metadata. Existing taxonomy is draw from
across the world.
These products are to be configured using J2EE web
services and a server binding protocol based on a high level scripting language
called Python.
An
advanced RDF repository is used to provide persistent storage for organized
sets of concept representations. Concept representations and organized
collections of these representations are convertible to standard ontology
representation languages. A formal
theory about co-occurrence patterns is used to express a category of
mathematical constructions called convolution operators.
The customer wants a
global information framework that:
1.
Has
a high degree of language independence.
2.
Compresses
data regularity, primarily co-occurrence patterns, into structurally organized
concept representations
3.
Converts
uncertain, sketchy, sometimes incorrect instance information into clear,
concise and complete reports about a situation.
4.
Provides
a means to develop global synthesis over a large event space.
Ontological modeling
also provides new types of information technology features that are not
anticipated by the customer. For
example, the set of concept representation, and ontological model encoding
structures, allows access to past information instantaneously, without
relational database indexing.
The BCNGroup has
specified a six month technology integration and a fully functional beta site
deployment. The beta deployment will
serve as a prototype for additional deployments based on similar
principles.
Our technology delivers
means for deriving language independent situation and global event analysis
based on ontological models. The
software systems integrate semantic extraction in English, Arabic and
German. Other languages are possible
using the same techniques.
General principles
related to the differential ontology framework are laid out so that multiple
languages are integrated into constructed elements of a single explicit
ontology. The integrated system will
demonstrate features that are not available from any current semantic or
knowledge management system.
Our general principles
are part of an emerging discipline related to the measurement of complex
systems and the use of formal ontology as a means to abstract knowledge about
situations that arise in a complex world.
Iterative modifications to visualized ontology lead to an adaptation of
ontological models of these situations.
The Global Information
Framework depends critically in having a user interface that allows any subject
matter expert to have visual access over situationally relevant concept
representations. Situational relevance requires a subsetting mechanism. Control over concept subsetting mechanisms
serves to focus the attention of the user into part of the elements of an
ontological model. Once elements are
identified, by our selective attention mechanisms, these elements are extended
using ontological inferencing to produce a coherent view of what is known and
encoded within the representational space.
User input in each phase of this process is not merely supported, user
input is required to achieve relevant and fidelity.
This human-centric,
ontological model based, approach creates a distinct alternative to classical
expert systems and artificial intelligence approaches. Our alternative creates a higher dependency
on human involvement and requires that some humans accept responsibility over
decisions. Clearly the cultural
barriers we, the BCNGroup scientists, have experienced have something to do
with the requirement that humans accept more responsibility and are subject to
rational outcome metrics. We have been
forced to take the position that the artificial intelligence funding is wrong
minded both based on arguments from the natural sciences and because the effect
of artificial intelligence expenditures is to allow the consulting IT
industries to not take complete responsibility for past, current of future performance
outcomes.
Relevant cognitive
neuroscience tells us that attentional focus evokes cognitive responses. This science also tells us a great deal
about how this attentional focus is managed by the brain system [6]
. As the BCNGroup moves forward the
Results from cognitive
neuroscience have been used to design user interface elements that change the
visualization based on user commands and actions. These changes in visualized
state produce shifts in figure/ground perception.
As in physical and
engineering sciences, the results of collective intellectual work lead to
advances in science, including economic and biological science.
In the context of other types of collective work, such as in
financial services, intelligence analysis, fiduciary reporting, compliance
reporting, complex control, and biological science; the GIFT provides a means
to produce a type of collective intelligence.
Subject matter experts create this collective intelligence using our
software components.
Global information frameworks provide features that
are not available from any current semantic technology or knowledge management
system. It is in this sense that the
technology is a gift to our society.
GIFT was designed specifically to address global
analysis of US Customs and Border Protection selectivity of commodity shipments
for targeted examination of containers. However, GIF technology is applicable
to far more than the current critical problems in information technology
modernization efforts at Treasury, State, Department of Defense and Department
of Homeland Security. GIFT provides a principled ground in which to extend
formal models of natural event structure those objects of investigation are by
nature complex. The gift has to be
accepted, however, and so far the revolutionary nature of the approach on which
these integrated technologies depends has been counter intuitive to mainstream
artificial intelligence and to the IT procurement process.
Over the past decade a revolutionary ground has been
prepared by scientists and technologists who felt that intelligence and military
activity required a new information science paradigm. We have faced an entrenched discipline and procurement
process. The individual involved in
maintaining this process have been, so far, unwilling to even accept the
possibility of a paradigm shift.
So natural scientists have developed the “Second
School of Semantic Science”. The Second
School points out that the First School treats intelligence as if it is a
merely a mechanism that can be decomposed into a set of fixed semantic states
and a first order logic defined on this set.
Natural science, and common sense, tells us that intelligence is not
proper characterized in this way.
The following have been our long term design
objectives for Treasury and DoD:
·
Improve the quality of
analysis, and utility of complex intelligence products;
·
Provide specific and tailored
intelligence to enhance our ability to visualize the battlespaces, including
the terrorism engagement space, and ensure total operational awareness;
·
Improve the throughput
and speed of delivery of National intelligence;
·
Reduce or eliminate
unnecessary redundancy and duplication in intelligence products;
·
Strengthen information
and production management and ensure policies, procedures, concept development,
training, and technical-human engineering;
·
Establish and integrate
standards (based on mandated Department of Defense (DoD) community
standards/architectures) for commonality, interoperability, and modernization
in coordination with appropriate elements and activities;
·
Explore and examine very
advanced technology and concepts for future integration;
·
Provide a thematic
analysis as the basis for information warfare, both defensive and offensive
activities.
Our proposed beta deployment demonstrates the
viability of a specific roadmap. The
roadmap starts where our industries are today and shows a specific path to the
design, development and deployment of next generation tools.
We have interoperability with W3C standards, but our
capabilities are forward looking.
Perhaps the most critical contribution is a data encoding mechanism that
supports the development of collective intelligence and work products that can
be re-used as models of complex phenomenon.
Our proof of concept involves the deployment of a
prototype that is fully operational and is to be used in critical context. This prototype can be deployed at any site
and requires only part of the deployment team have high levels of security
clearance.
Generality: Nothing in this roadmap excludes the
development of GIFT deployments in bio-chemical engineering, banking,
manufacturing, publishing or any other complex human activity. The technology is considered to be more advanced than any existing
e-commerce system; or any deployed knowledge management system. Several of our teaming corporations are
precisely those corporations who are regarded as having the leading edge
deployed systems. There participation
is made at or below costs simply because the methods and capabilities of these
systems are under-appreciated due to the break they make from classical
artificial intelligence and expert system based IT deployments.
Section 3:
Ontology architecture
We have available a
package of patented innovations in data encoding technology.

Figure 3: distributed Ontology
Management Architecture (d-OMA)
In several layers of our
existing software, data regularity in context is discovered using semantic
extraction techniques. The patterns
made from regularity are made explicit in the form of a set of concept
indicators. For us, R&D does not
mean “research and development”, because this term has been deemed or “no
value” in the IT industry or in government IT procurement circles. The political incorrectness of funding long
term R&D stems from the failure of research and development using the
classical approaches.
For us, “R&D” is
research and discovery. The research,
in this context, is an individual investigation of some complex natural
phenomenon, such as the purchasing interests of an on-line shopper. The GIFT provides human-centric investigator
tools. Like microscopes and carpentry
tools, the GIFT tools do nothing by themselves. These tools become useful when they are used by skillful domain
experts.
What are observed by
the tools are conceptual structures in social discourse. What is constructed is a model of how these
structures set within the various thematic expressions. The subject indicators have structural
relationships to individual natural language terms and patterns of term
occurrences.
Subject matter indicators are identified using several types of patented
semantic extraction processes. These
include two forms of patented conceptual aggregation from a letter, stem, word
or word phrase, n-gram measurement of text [9];
as well as newly patented probabilistic Latent Semantic Analysis (PLSA) [10].
Ontology construction, no matter how they are developed, consists of
representations of concept schemas and their relationships. In GIFT, natural language terms, and
patterns of co-occurrence, provide ontology definition as sets of concepts,
with properties and attributes, organized with visual navigational aids.
A subsetting mechanism brings into a visual focus all and only that part of
extensive ontology repositories persisted in RDF repository and hash
tables. Human interfaces to shared
ontology repositories are designed to mimic the perceptual figure/group
relationship observed by natural science to be the key mechanism involved in
individual action-perception cycles.
These human interfaces allow local manipulation and editing of sets of
concepts found to be relevant to individual analysis of specific events, such
as US Customs and Border Patrol selecting a container for a search
procedure.
A local analysis by an individual occurs. Using the new software, this analysis occurs with
the greatest amount of flexibility. At
each of many sites, human analysts edit small details, modify underlying
assumptions, and otherwise examine how concepts identified locally might be
related to sets of concepts being maintained in global repositories. The concepts themselves are equipped with
metadata about how these concepts might be identified in text, and
co-occurrence relationships that concepts have with other concepts in the
repository.
A collective intelligence can be expected.
Subject matter experts enable a type of global analysis due to local
manipulation. Local manipulation occurs
based on direct experience, but this experience is conditioned by the recent
global analysis. Real time intelligence
response is therefore very likely.
Collective global analysis occurs because individual human interfaces to concept
repositories have selective attention mechanisms. BCNGroup scientists understand the physical and mental activity
involved in individual human action, cognition and perception cycles. This understanding is part of several
academic literatures; “cognitive engineering” and “evolutionary
psychology”. The focus of this science
is on the behavioral patterns of people and systems of living systems.
Collective global
analysis occurs within contexts that are implicitly (not necessarily
explicitly) structured by relationships that are established when many
individuals work with localized ontology. Individuals produce reports based on
the subsetting mechanisms that “retrieve” that part of the globally stored RDF
repository that is deemed relevant. As
local analysis produces reports, these reports themselves are subjected to
linguistic and ontological methods where reconciliation of terminological and
viewpoint differences become critical.
SchemaLogic’s SchemaServer product will be interfaced with web services that
manage the specifications of a global terminology library. Terminological reconciliation is a current
capability provided by SchemaLogic Inc to a variety of commercial clients.
Ontological science
tells us that local manipulation of concepts by an individual within the
contingencies of the moment involve human tacit knowledge and can rapidly lead
to a deep understanding of a specific event in the context of larger issues and
concerns. However, human understanding
is both highly situational and strongly shaped by opinion. Any specific
understanding depends on individual(s) defining terms so that these fit within
a coherent view of the events occurring.
In key situations, a single common viewpoint is not possible, nor is a
single viewpoint always desirable.
The control of managed
vocabulary is essential to uniform work on enterprise wide ontological
models. One key failure of the Tom
Berners

Figure 4: The production of scoped
ontology with humans in the loop
A second knowledge
management system is a product from Acappella Software Inc. The regularity of responses to standard situations
can be studied, resulting in patterns of expression that are captured in
pre-existing textual snippets of expression.
This allows a patented process to assist in flexible report generation
having the 3Cs, clarity, completeness and consistency.
Both
knowledge management systems are tied together with standard knowledge
representational data encoding based on RDF (Resource Description Framework)
and Orbs (Ontological referential bases).
Ontology representations can be used within the difficult contexts of
uncertain information, shifts in context, and changes in the underlying
situation. In most cases, a human
analyst will easily alter interpretations and schema properties in real time to
accommodate these practical limitations.
The two commercial knowledge management systems
provide support for cultural transitions.
Section
4: The ontology encoding innovation
Scoped ontology
sits on an exceedingly simple data structure standard, developed and published
by OntologyStream Inc.
Bypasses to the well known XML persistence and search limitations are
found by using this encoding.
This data
structure is a topic taxonomy organized in a specific fashion, disclosed as a
matter of public information.
Differential ontology framework works in a specific fashion to create a
global information framework where managed vocabulary and ontology is generated
and used as a knowledge management capability.
Several small
deployments have been completed. For
one of the state governments, a consultant/specialist created 216 concept
representations and organized them into the upper two layers of a differential
ontology framework. A prototype for a
large deployment in US Customs was developed but not deployed (as of May 2005). We are seeking a contractual means to deploy
based on the team agreements between ten leading, but small, innovative
knowledge technology corporations.
Non-deployment of the prototype is deemed by our group to be one
manifestation of profound incompetence by specific Lockheed Martin
management. A GAO investigation was
initiated in May 2005.
Situationally
focused models of specific events were considered as targeting software to be
used in a future modernized US Customs and Border Protection. Work stopped on this deployment as of March
2005, due to contracting issues [11]. However the concept of scoped ontology has
now been demonstrated in the state (DHS) deployment and in a commercial
deployment (not disclosed). These are
small deployments which act as a proof of product.
The upper layer
of the differential ontology framework is a set of universal abstractions, such
as abstractions about the flow of time. The middle layer contains domain
specific concepts and utilities such as security policies, concepts about how
containers are searched, or concepts about what is a commodity.
In our small
state DHS project, several specific systemic risks were identified, leading to
corrections in risk management policies.
Differential
ontology is deployed within a action oriented process model called AIPM, see
Section 7, Figure 8). Working from
event reports, semantic extraction activities are developed and data instances
are parsed to produce reporting triggers.
Triggers launch
processes that construct scoped ontology.
The development of small (5 -20 topics) situationally specific scoped
ontology is the usual outcome from automatic scoping processes. These ontology representations can be used
for rapid communication of structured information and for building histories. We need the larger deployment to show how
ontology streaming might aid in global analysis and responses.
In some
domains, for example Custom’s Harmonized Tariff Schedule, there may be hundreds
of thousands of concepts, but a small set of organizing principles that
generate categories over these topics.
The categories are suggested by algorithms, and then reified by human
analysts.
Event specific
categories developed as a means to visualize elements of event space
phenomenon. The event phenomenon is then “understood” using the concepts in
upper abstract and domain specific ontology.

Figure 5: The GIFT architecture as of
2002
The full GIFT
architecture is being realized using a server glue language called Python. The key is to bring the required products
together in a work environment.
Figure 5 (first
seen in 2002) expresses our long term interest in Visual Text (the Text
Analysis International Corporation Inc (TAI) ), semantic extraction and
schema logic (SchemaServer).
NLP++
is the language that TAI founder
Probabilistic
latent semantic analysis (PLSA), patented in 2005 by Recommind
Inc, is used to develop n-ary
representations of subject matter indicators.
NdCore (Applied Technical Systems Inc), Readware (MITi Inc);
and SLIP analysis (OntologyStream Inc) is used to get different looks at the same
data. As the set of subject matter
indicators are developed, RDF encoded concept representations are developed and
the NLP++ based software is now used to instrument the detection of these
concepts in text. The “two sides” of
the differential ontology framework are established.
The
SchemaServer product from SchemaLogic provides the knowledge management
features required to management controlled vocabularies and thus to allow human
use of natural language to control the development of use of sets of concepts
(ontology).
Acappella
Software provides a product that helps to create clear, complete and concise,
the 3Cs; written reports in the first place.
The
development of our data layer has been in conjunction with our work on
extending some intellectual property for Applied Technical Systems, and is
discussed in a public document titled “Notational System for the Ontology
Referential Base (Orb)” [12].