Global Information Framework and

Knowledge Management

 

 

 Revised slightly

April 16, 2006

With foot notes

 

             

 

 

A prototype

 

Public Document

Updated Friday, July 15, 2005, Version 9.8

 

Point of Contact: Dr Paul S Prueitt,   psp  @  ontologystream.com

 

Behavioral Computational Neuroscience Group

Development Committee

 


Global Information Framework and

Knowledge Management

 

A prototype

 

Position Paper

 

Behavioral Computational Neuroscience Group

Development Committee

 

Table of Contents

 

 

Why a roadmap is needed for semantic technology adoption                                 4

 

Executive overview                                                                                             7

 

Section 1: Proof of concept                                                                                 10

 

Section 2:  Context and objectives                                                                       12

 

Section 3: Ontology architecture                                                             15

 

Section 4: The ontology encoding innovation                                                         20

 

Section 5: Informational convolution                                                                     26

 

Section 6: The minimal deployment                                                                     29

 

Section 7: Regularity in report generation                                                 30

 

Section 8: Predictive Analysis Methodology                                                         33

 

Section 9:  A future anticipatory technology                                                          40

 

Section 10: The Second School of Semantic Science                                           44

 

Advisory Committee and Companies                                                                    48

 

Appendix A: Statement of Purpose                                                                      49

 

Appendix B:  Project Outline                                                                               50

 

Appendix C:  Semantic Science                                                                          51

 

Appendix D:  Knowledge Sharing Foundation Core                                                53

 

Keywords                                                                                                          54

 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Copyright 2005  BCNGroup

Why a roadmap is needed for semantic technology adoption

 

In 2005 everyone knows what a horse and buggy is and what an automobile is.  Each person in our society knows the story of the emergence of the automobile manufacturing business sector and the American love affair with the automobile. 

We do not know to what extent meaning might be captured in a “semantic web[1]”. We have not experienced anything that informs us about what semantic technology might become. Relevant issues in linguistics, social theory and the nature of science are not well known in departments of computer science.  Most of computer science is shaped by engineering theory and scientific reductionism.

A roadmap is needed.  One is provided here by a group of natural scientists and mathematicians.

We propose a broad program to establish the intellectual and technology foundation to a science of knowledge systems, and to integrate and deliver to the market place information science based on the proper use of what is called “machine encoded ontology”. 

The first delivery of ontology-based technology can be accomplished within a few months, given our previous work and the existing technology components.


BCNGroup scientists recommend a demonstration program that has three parts organized in two phases.

Phase 1

1) Technology integration

 

Phase 2:

2) Advanced knowledge management certification

3) Ontology development.

 

Our proposal addresses these parts one at a time, the first part is proposed at a cost of $750,000 over period of six months. 

The first part depends on our (already) having completed a principled selection of advanced knowledge management systems, semantic extraction systems and data persistence systems. 

In each case patents protect the underlying technology and allow a science advisory board to develop an in-depth description of each technology component.  We have developed curriculum that exposits the philosophical principles on which their software user interfaces are dependant.  This curriculum is being readied for delivery as knowledge management certification and as text books designed for university curriculum. 

Beyond the first deployment, the concept of a knowledge sharing foundation [2] is proposed, and is being readied as a “Red Hat” type business model.  This is not, however, designed as a business.  Rather the knowledge sharing foundation is designed as a cultural institution directed to found the knowledge sciences and to develop curriculum that helps average Americans and individuals all over the world. 

 

 

 


Executive overview

 

A human-centric information production [3] capability is defined and existing, commercial, software identified.  A distributed information system is specified that will enable the real time representation and sharing of human knowledge about situations.   A global information framework is used as a human control interface over complex ontology. 

Example: Aircraft landing at a specific airport will express behavioral patterns.  An airport ontology and aircraft landing ontology is used to provide an interpretation of the behavioral patterns expressed in each landing.  Over time, the observed behavioral patterns lead to early diagnosis of risks.  A human looks at the patterns and makes judgments based on personally held tacit knowledge.  New concepts about the patterns are encoded as meta-data.  The patterns themselves are encoded as new behavioral ontology reflecting the history of observations about aircraft landing at a specific airport. 

Inputs come from any reporting-software-system.  An example is US Customs and Border Protection reports on search and targeting operations or administrative rulings on tariff codes.  Inputs can be developed from any event reporting mechanism, whether written reports or reports that involve the manual development or modification of ontology. 

Outputs include a computable and visualizable historical record about situations reported.  The record is expressed as ontology and human visualization of this record is provided. 

Visualization of graphical structure requires human perception to evoke an experience of knowledge about a specific situation, or an event space.  Graph labels suggest meaning in much the same way as sentences suggest meaning when humans compose sentences. 

Figure 2: Visualization of concept indicators in a collection of fables

Language dependence is not fully achieved, for reasons that have to do with differences between natural languages. Each natural language has language dependant characteristics.   In principle, our technology establishes a foundation for using ontological models having correspondences to sets of concept representations.  

Ontological models are envisioned as being a type of “interlingua”, not of words composed in grammar, but as systems of signs that are interpreted by humans in various natural language settings. 

Ontological models provide metadata that indicate where possible misunderstanding might occur, thus ontological models provide a complex formalism to help in the translation and transcription of meaning from one human language to another.  Like mathematical models, ontologies are useful as enablers of computations based on the structure of the defined sets of concepts.  In mathematic the concepts are those related to field dynamics and to the conservation laws of physics.  The ontological models we are developing are about more complex subjects, such as the intentions of a determined enemy to bring elements of bioterrorism into the United States. 

Ontology extends Hilbert mathematics from deterministic systems to complex systems having uncertainty and under-determined constraints.  Representing real advances in objective science, ontological models provide a computational basis for the real time projection of human knowledge within communities of practice.  Medical informatics and bioinformatics are demonstrating value in two early utilizations of ontological modeling.  A construction over conceptual representations of topics in social discourse has been applied to medical literatures.  These constructions bring relevant information to medical research communities.  Bio-event defense architecture has been outlined and proposed by members of the BCNGroup.  Application in pharmaceutical research and development can be identified under non-disclosure agreements.  These applications involve conceptual modeling as well as direct modeling of the ontology involved in gene expression. 

Subject matter indicators are represented in computable ontology constructions, including the most simple, and yet most powerful, one based on a concept label and one to n (undetermined integer) word, word stem or word phrases.  These concept labels are formally represented in several knowledge base technologies as an n-tuple. 

< a0, a1, . . . an >

Our proposed deployment, takes these technologies and integrates them.  We also add a data mining process where fast convolution transforms have both a mathematical formulation as well as operational realizations of these transforms as precise data retrieval methods.

Convolution operators are fast computed over the elements of a set of concept representations. The convolution operator results in the separation of context and merging of ontology.  One pass over an in-memory data structure is sufficient. 

Figure 1: Co-occurrence patterns with “hash”

From these technologies, sets of concept representations are developed and made accessible to search and analysis.  Several semantic extraction tools are integrated with a commercially available RDF repository [4].  The output of these semantic extraction tools is a set of subject matter indicators, represented as RDF statements.  The basic set operations are available as formalism and thus the standardization of these constructions are a matter of public record. 

It is vital to recognize that the subject matter indicators are given final interpretations by knowledgeable humans. 

Our Phase 1 proposal is to integrate five stand-alone COTS[5] products; two semantic extraction systems, two knowledge management systems, and a taxonomy acquisition system, with an Open Source ontology editing tool (Protégé), an Open Source document repository system, and a simple RDF (Resource Description Framework) data repository.

Knowledge management systems address the need to enhance the quality of reporting as well as to make managed vocabularies available as an interface between ontology constructions and normal human use.  Semantic extraction systems convert free form text into structured metadata. Existing taxonomy is draw from across the world. 

These products are to be configured using J2EE web services and a server binding protocol based on a high level scripting language called Python. 

An advanced RDF repository is used to provide persistent storage for organized sets of concept representations. Concept representations and organized collections of these representations are convertible to standard ontology representation languages.  A formal theory about co-occurrence patterns is used to express a category of mathematical constructions called convolution operators. 


Section 1:  Proof of concept

 

The customer wants a global information framework that:

1.         Has a high degree of language independence.

2.         Compresses data regularity, primarily co-occurrence patterns, into structurally organized concept representations

3.         Converts uncertain, sketchy, sometimes incorrect instance information into clear, concise and complete reports about a situation. 

4.         Provides a means to develop global synthesis over a large event space. 

 

Ontological modeling also provides new types of information technology features that are not anticipated by the customer.  For example, the set of concept representation, and ontological model encoding structures, allows access to past information instantaneously, without relational database indexing.  

The BCNGroup has specified a six month technology integration and a fully functional beta site deployment.  The beta deployment will serve as a prototype for additional deployments based on similar principles.  

Our technology delivers means for deriving language independent situation and global event analysis based on ontological models.  The software systems integrate semantic extraction in English, Arabic and German.  Other languages are possible using the same techniques. 

General principles related to the differential ontology framework are laid out so that multiple languages are integrated into constructed elements of a single explicit ontology.  The integrated system will demonstrate features that are not available from any current semantic or knowledge management system. 

Our general principles are part of an emerging discipline related to the measurement of complex systems and the use of formal ontology as a means to abstract knowledge about situations that arise in a complex world.  Iterative modifications to visualized ontology lead to an adaptation of ontological models of these situations. 

The Global Information Framework depends critically in having a user interface that allows any subject matter expert to have visual access over situationally relevant concept representations. Situational relevance requires a subsetting mechanism.  Control over concept subsetting mechanisms serves to focus the attention of the user into part of the elements of an ontological model.  Once elements are identified, by our selective attention mechanisms, these elements are extended using ontological inferencing to produce a coherent view of what is known and encoded within the representational space.  User input in each phase of this process is not merely supported, user input is required to achieve relevant and fidelity.

This human-centric, ontological model based, approach creates a distinct alternative to classical expert systems and artificial intelligence approaches.  Our alternative creates a higher dependency on human involvement and requires that some humans accept responsibility over decisions.  Clearly the cultural barriers we, the BCNGroup scientists, have experienced have something to do with the requirement that humans accept more responsibility and are subject to rational outcome metrics.  We have been forced to take the position that the artificial intelligence funding is wrong minded both based on arguments from the natural sciences and because the effect of artificial intelligence expenditures is to allow the consulting IT industries to not take complete responsibility for past, current of future performance outcomes.

Relevant cognitive neuroscience tells us that attentional focus evokes cognitive responses.  This science also tells us a great deal about how this attentional focus is managed by the brain system [6] .  As the BCNGroup moves forward the Second School (see Section 10) we will develop user profiles that use the Human-markup language standard [7] to bring elements of cognitive engineering into the interface design.  A simple control interface has been designed from existing text based and mouse based interfaces [8].  The beta deployment of this interface is within two months of funding ($750,000).

Results from cognitive neuroscience have been used to design user interface elements that change the visualization based on user commands and actions. These changes in visualized state produce shifts in figure/ground perception.  


Section 2:  Context and objectives

 

As in physical and engineering sciences, the results of collective intellectual work lead to advances in science, including economic and biological science. 

In the context of other types of collective work, such as in financial services, intelligence analysis, fiduciary reporting, compliance reporting, complex control, and biological science; the GIFT provides a means to produce a type of collective intelligence.  Subject matter experts create this collective intelligence using our software components.  

Global information frameworks provide features that are not available from any current semantic technology or knowledge management system.  It is in this sense that the technology is a gift to our society.

GIFT was designed specifically to address global analysis of US Customs and Border Protection selectivity of commodity shipments for targeted examination of containers. However, GIF technology is applicable to far more than the current critical problems in information technology modernization efforts at Treasury, State, Department of Defense and Department of Homeland Security. GIFT provides a principled ground in which to extend formal models of natural event structure those objects of investigation are by nature complex.  The gift has to be accepted, however, and so far the revolutionary nature of the approach on which these integrated technologies depends has been counter intuitive to mainstream artificial intelligence and to the IT procurement process.

Over the past decade a revolutionary ground has been prepared by scientists and technologists who felt that intelligence and military activity required a new information science paradigm.  We have faced an entrenched discipline and procurement process.  The individual involved in maintaining this process have been, so far, unwilling to even accept the possibility of a paradigm shift. 

So natural scientists have developed the “Second School of Semantic Science”.  The Second School points out that the First School treats intelligence as if it is a merely a mechanism that can be decomposed into a set of fixed semantic states and a first order logic defined on this set.  Natural science, and common sense, tells us that intelligence is not proper characterized in this way.

The following have been our long term design objectives for Treasury and DoD:

·         Improve the quality of analysis, and utility of complex intelligence products;

·         Provide specific and tailored intelligence to enhance our ability to visualize the battlespaces, including the terrorism engagement space, and ensure total operational awareness;

·         Improve the throughput and speed of delivery of National intelligence;

·         Reduce or eliminate unnecessary redundancy and duplication in intelligence products;

·         Strengthen information and production management and ensure policies, procedures, concept development, training, and technical-human engineering;

·         Establish and integrate standards (based on mandated Department of Defense (DoD) community standards/architectures) for commonality, interoperability, and modernization in coordination with appropriate elements and activities;

·         Explore and examine very advanced technology and concepts for future integration;

·         Provide a thematic analysis as the basis for information warfare, both defensive and offensive activities.

Our proposed beta deployment demonstrates the viability of a specific roadmap.  The roadmap starts where our industries are today and shows a specific path to the design, development and deployment of next generation tools.

We have interoperability with W3C standards, but our capabilities are forward looking.  Perhaps the most critical contribution is a data encoding mechanism that supports the development of collective intelligence and work products that can be re-used as models of complex phenomenon.   

Our proof of concept involves the deployment of a prototype that is fully operational and is to be used in critical context.  This prototype can be deployed at any site and requires only part of the deployment team have high levels of security clearance.

 

Generality:  Nothing in this roadmap excludes the development of GIFT deployments in bio-chemical engineering, banking, manufacturing, publishing or any other complex human activity.  The technology is considered to be more advanced than any existing e-commerce system; or any deployed knowledge management system.  Several of our teaming corporations are precisely those corporations who are regarded as having the leading edge deployed systems.  There participation is made at or below costs simply because the methods and capabilities of these systems are under-appreciated due to the break they make from classical artificial intelligence and expert system based IT deployments.


Section 3: Ontology architecture

 

We have available a package of patented innovations in data encoding technology. 

 

Figure 3: distributed Ontology Management Architecture (d-OMA)

 

In several layers of our existing software, data regularity in context is discovered using semantic extraction techniques.  The patterns made from regularity are made explicit in the form of a set of concept indicators.  For us, R&D does not mean “research and development”, because this term has been deemed or “no value” in the IT industry or in government IT procurement circles.  The political incorrectness of funding long term R&D stems from the failure of research and development using the classical approaches.

For us, “R&D” is research and discovery.  The research, in this context, is an individual investigation of some complex natural phenomenon, such as the purchasing interests of an on-line shopper.  The GIFT provides human-centric investigator tools.  Like microscopes and carpentry tools, the GIFT tools do nothing by themselves.  These tools become useful when they are used by skillful domain experts. 

What are observed by the tools are conceptual structures in social discourse.  What is constructed is a model of how these structures set within the various thematic expressions.  The subject indicators have structural relationships to individual natural language terms and patterns of term occurrences. 

Subject matter indicators are identified using several types of patented semantic extraction processes.  These include two forms of patented conceptual aggregation from a letter, stem, word or word phrase, n-gram measurement of text [9]; as well as newly patented probabilistic Latent Semantic Analysis (PLSA) [10].

Ontology construction, no matter how they are developed, consists of representations of concept schemas and their relationships.  In GIFT, natural language terms, and patterns of co-occurrence, provide ontology definition as sets of concepts, with properties and attributes, organized with visual navigational aids. 

A subsetting mechanism brings into a visual focus all and only that part of extensive ontology repositories persisted in RDF repository and hash tables.  Human interfaces to shared ontology repositories are designed to mimic the perceptual figure/group relationship observed by natural science to be the key mechanism involved in individual action-perception cycles.  These human interfaces allow local manipulation and editing of sets of concepts found to be relevant to individual analysis of specific events, such as US Customs and Border Patrol selecting a container for a search procedure. 

A local analysis by an individual occurs. Using the new software, this analysis occurs with the greatest amount of flexibility.  At each of many sites, human analysts edit small details, modify underlying assumptions, and otherwise examine how concepts identified locally might be related to sets of concepts being maintained in global repositories.  The concepts themselves are equipped with metadata about how these concepts might be identified in text, and co-occurrence relationships that concepts have with other concepts in the repository. 

A collective intelligence can be expected.  Subject matter experts enable a type of global analysis due to local manipulation.  Local manipulation occurs based on direct experience, but this experience is conditioned by the recent global analysis.  Real time intelligence response is therefore very likely.

Collective global analysis occurs because individual human interfaces to concept repositories have selective attention mechanisms.  BCNGroup scientists understand the physical and mental activity involved in individual human action, cognition and perception cycles.  This understanding is part of several academic literatures; “cognitive engineering” and “evolutionary psychology”.  The focus of this science is on the behavioral patterns of people and systems of living systems.

Collective global analysis occurs within contexts that are implicitly (not necessarily explicitly) structured by relationships that are established when many individuals work with localized ontology. Individuals produce reports based on the subsetting mechanisms that “retrieve” that part of the globally stored RDF repository that is deemed relevant.  As local analysis produces reports, these reports themselves are subjected to linguistic and ontological methods where reconciliation of terminological and viewpoint differences become critical.

SchemaLogic’s SchemaServer product will be interfaced with web services that manage the specifications of a global terminology library.  Terminological reconciliation is a current capability provided by SchemaLogic Inc to a variety of commercial clients.  

Ontological science tells us that local manipulation of concepts by an individual within the contingencies of the moment involve human tacit knowledge and can rapidly lead to a deep understanding of a specific event in the context of larger issues and concerns.  However, human understanding is both highly situational and strongly shaped by opinion. Any specific understanding depends on individual(s) defining terms so that these fit within a coherent view of the events occurring.  In key situations, a single common viewpoint is not possible, nor is a single viewpoint always desirable.

The control of managed vocabulary is essential to uniform work on enterprise wide ontological models.  One key failure of the Tom Berners Lee (W3C standards bodies) is the absence standard methods for the reconciliation of terminological differences.  Our system has several layers of methodology that are tied by first principles to the way our data is encoded into computer memory.  The data encoding specifications are simple, non-proprietary, and available to review and use.

 

Figure 4: The production of scoped ontology with humans in the loop

 

A second knowledge management system is a product from Acappella Software Inc. The regularity of responses to standard situations can be studied, resulting in patterns of expression that are captured in pre-existing textual snippets of expression.  This allows a patented process to assist in flexible report generation having the 3Cs, clarity, completeness and consistency.

Both knowledge management systems are tied together with standard knowledge representational data encoding based on RDF (Resource Description Framework) and Orbs (Ontological referential bases).  Ontology representations can be used within the difficult contexts of uncertain information, shifts in context, and changes in the underlying situation.  In most cases, a human analyst will easily alter interpretations and schema properties in real time to accommodate these practical limitations.

The two commercial knowledge management systems provide support for cultural transitions.


Section 4:  The ontology encoding innovation

 

Scoped ontology sits on an exceedingly simple data structure standard, developed and published by OntologyStream Inc.  Bypasses to the well known XML persistence and search limitations are found by using this encoding.

This data structure is a topic taxonomy organized in a specific fashion, disclosed as a matter of public information.  Differential ontology framework works in a specific fashion to create a global information framework where managed vocabulary and ontology is generated and used as a knowledge management capability.

Several small deployments have been completed.  For one of the state governments, a consultant/specialist created 216 concept representations and organized them into the upper two layers of a differential ontology framework.  A prototype for a large deployment in US Customs was developed but not deployed (as of May 2005).  We are seeking a contractual means to deploy based on the team agreements between ten leading, but small, innovative knowledge technology corporations.  Non-deployment of the prototype is deemed by our group to be one manifestation of profound incompetence by specific Lockheed Martin management.  A GAO investigation was initiated in May 2005. 

Situationally focused models of specific events were considered as targeting software to be used in a future modernized US Customs and Border Protection.  Work stopped on this deployment as of March 2005, due to contracting issues [11].  However the concept of scoped ontology has now been demonstrated in the state (DHS) deployment and in a commercial deployment (not disclosed).   These are small deployments which act as a proof of product. 

The upper layer of the differential ontology framework is a set of universal abstractions, such as abstractions about the flow of time. The middle layer contains domain specific concepts and utilities such as security policies, concepts about how containers are searched, or concepts about what is a commodity.

In our small state DHS project, several specific systemic risks were identified, leading to corrections in risk management policies.

Differential ontology is deployed within a action oriented process model called AIPM, see Section 7, Figure 8).  Working from event reports, semantic extraction activities are developed and data instances are parsed to produce reporting triggers. 

Triggers launch processes that construct scoped ontology.  The development of small (5 -20 topics) situationally specific scoped ontology is the usual outcome from automatic scoping processes.  These ontology representations can be used for rapid communication of structured information and for building histories.   We need the larger deployment to show how ontology streaming might aid in global analysis and responses.

In some domains, for example Custom’s Harmonized Tariff Schedule, there may be hundreds of thousands of concepts, but a small set of organizing principles that generate categories over these topics.  The categories are suggested by algorithms, and then reified by human analysts.

Event specific categories developed as a means to visualize elements of event space phenomenon. The event phenomenon is then “understood” using the concepts in upper abstract and domain specific ontology. 

Figure 5: The GIFT architecture as of 2002

The full GIFT architecture is being realized using a server glue language called Python.  The key is to bring the required products together in a work environment. 

Figure 5 (first seen in 2002) expresses our long term interest in Visual Text (the Text Analysis International Corporation Inc (TAI) ), semantic extraction and schema logic (SchemaServer). 

NLP++ is the language that TAI founder Amnon Meyers developed and used to build a, now patented, “system for developing (rapidly) text analysis systems”.  Our scientists know that each targeted domain of text data elements has different structure to functions relationships. Textural semantics, or meaning, is critically dependant on the specific domain of text data elements. The NLP++ language is used to instrument the focused measurement of function to structure relationships. 

Probabilistic latent semantic analysis (PLSA), patented in 2005 by Recommind Inc, is used to develop n-ary representations of subject matter indicators.  NdCore (Applied Technical Systems Inc), Readware (MITi Inc); and SLIP analysis (OntologyStream Inc) is used to get different looks at the same data.  As the set of subject matter indicators are developed, RDF encoded concept representations are developed and the NLP++ based software is now used to instrument the detection of these concepts in text.  The “two sides” of the differential ontology framework are established. 

The SchemaServer product from SchemaLogic provides the knowledge management features required to management controlled vocabularies and thus to allow human use of natural language to control the development of use of sets of concepts (ontology). 

Acappella Software provides a product that helps to create clear, complete and concise, the 3Cs; written reports in the first place.

The development of our data layer has been in conjunction with our work on extending some intellectual property for Applied Technical Systems, and is discussed in a public document titled “Notational System for the Ontology Referential Base (Orb)” [12].