Communication                                                         KSF-Report

 

Knowledge Sharing Core

 

Version: 8/23/2003

 

 

Statement of Capability

 

 

 

Links within this document

 

·        Overview

·        Data Sources

·        Use of Patents and Software Licenses

·        Educational Services

·        Note from the Natural Sciences

·        Summary

·        Appendix – A Knowledge Sharing Core System Diagram

·        Appendix B – Engineering support needed

·        Ontology Stream Inc Stock Distribution

 

 


 

Overview

A teaming agreement is being negotiated in public view.  As a team, we propose to use selected patented intellectual property and proprietary methodologies, as appropriate, from the following companies:

SAIC will be asked to provide management support and a license for Latent Semantic Indexing

 

Applied Technical Systems Incwill be asked to supply Continuous Connection Model data structure and NdCore conceptual rollup technology

 

Text Analysis International Inc  - has agreed to supply rule formation and deep parsing capabilities, plus an existing TAI web crawler for finding and acquiring new information from web sites.

 

Schema Logic Incwill be asked to supply a SchemaServer logic between viewpoints or schema.  SchemaServer uses a proprietary methodology to assist in reconciliation of terminological usage of multiple controlled vocabularies.   However, SchemaLogic will be integrated using a team employed by OntologyStream, since SchemaLogic’s business model precludes investment in a relationship with DARPA or OSD.  A SchemaServer will be deployed on a OntologyStream server (dedicated computer) for 24 months.  A dedicated knowledge engineer/ knowledge management engineer will be employed by OntologyStream to use and develop knowledge artifacts based on a principled use of the SchemaServer. The SchemaServer will NOT be integrated into the Knowledge Sharing Core but will be an external resource to the Knowledge sharing core.

 

Entrieva Inc – will be asked to supply a competitive alternative to the NdCore conceptual rollup supplied by ATS to allow benchmarking and comparison between radically different approaches to conceptual rollup (automated ontology construction)

 

ClearForest Inc – will be asked to supply a user centric rule-making interface for harvesting data form the web.

 

Recommind Inc has a probabilistic Latent Semantic Indexing that will clearly compete well with SAIC’s LSI (based on linear algebra). 

 

OntologyStream Inc will provide consulting from a group of 12 scientists, including Dan Levine, Karl Pribram, Peter Kugler, Robert Shaw, Stephen Grossberg, Steven Newcomb, Art Murray, John Sowa, Lofti Zadeh, Walter Freeman, Raymond Bradley, and Steven Kercel. OntologyStream will also provide scientific oversight and architecture for the development and testing of the First Knowledge Sharing Core Virtual Private Network and capabilities.

 

CoreTalk Inc will provide consulting and design services using the Cubicon design language. 

 

Each of these companies has proven capability.  Our concept is to transfer lessons learned from military intelligence and procurement practices to the procurement of transformational technology, biodefense technology, the analysis of social discourse, and then to infrastructure protection intelligence (trucking and harbors) and finally to business and personal intelligence systems. 

 

Personal intelligence systems (for public use) are to be integrated into communities based game-type environments having distance learning and scholarship archives.  These companies, and others as resources permit, will supply software licenses to OntologyStream Inc.  Ontology Stream Inc will manage emerging vertical markets that exploit these software licenses. 


 

Data sources

 

In the history of text understanding systems research, what has always been critically needed is relevant data.  Most within our group judge the TREC, Message Understanding Conferences (MUC) and TIPSTER projects conducted by DARPA, CIA and NIST have being deeply flawed by measures of precision-recall that are deeply biased towards statistical methods.

 

However a Special Open Source data set is available for the analysis of Islamic social discourse.  A number of Islamic scholars are willing to participate in a project to provide a conversion of this data source into an archive that has differential ontology and is focused on understanding what might be gleaned from the open source analysis of themes in social discourse. 

 

Web harvests from Islamic social groups in 14 countries have been used in a project funded by NSC and called “J-39”.  An eight-month archive of web harvest has been developed at INSCOM.  This Open Source dataset is archived in an Oracle database at Object Sciences Inc (Alexandria Virginia). 

 

The current analysis system developed and deployed by SAIC and Object Sciences Inc at INSCOM, is judged to be insufficient to provide a measured fidelity between thematic analysis and Islamic memetic expression since June 2002.  The data set exists and can be reprocessed and a true thematic analysis of Islamic public/social discourse stood up as a poll type information system. 

 

Fable arithmetic:  A long-standing project, to develop “fable arithmetic” has demonstrated the integrated working of text analysis on small text corpus using NLP++, CCM/I-RIB knowledge base, differential ontology and logics on schema. We look for the trial deployments to now occur in the Fourth Quarter of this year. 

 

Small collection of patent abstracts:  The BCNGroup is a not for profit corporation registered since 2000 in the Commonwealth of Virginia.  This small collection of patent abstracts represents a very difficult test for text understanding systems.  This not-for-profit corporation has developed the goal of mapping all patent disclosures related to the knowledge science.

 


 

Use of Patents and software license

 

The notion of a Knowledge Sharing Foundation is that patents protect data mining, natural language processing, and decision support capability derived from linguistic-mathematical models.  Because of this protection, the full nature of the processes can be revealed in public and subjected to extended scholarly reviews. 

 

Information Production Systems:  Selected vendors and differential ontology has been judged to be sufficient to a design, demonstrate and deploy new and powerful paradigms that use machine-readable ontology to produce new information.  And the re-design and integration process using the Cubicon iconic language will lead to additional patent applications made jointly by OntologyStream and the participating vendors. 

 

 

Figure 1: Actionable Intelligence Process Model (AIPM)

 

The Knowledge Sharing Foundation concept produces a use-driven evaluation of any intelligence, or more generally any information production, capability.  This evaluation is defined as part of the Actionable Intelligence Process Model (see Figure 1).  Network Centric Warfare regards the use of the AIPM   as a conversion of an information push paradigm to the information post and pull paradigm.  To achieve this capability, we will need transaction instrumentation and analytic reporting mechanisms so the use metrics drive the selection of capabilities added into the Knowledge Sharing Foundation. 

 

 


 

Educational Services

 

One purpose of public disclosure of underlying technology is to provide a liberal understanding, to end users, of the capabilities of this technology as an information production system.  Thus users in the military, intelligence and medical professional communities will have reasonable access to a liberal understanding, from leading university professors, about what are and what are NOT possible with these patented processes. 

 

 

Figure 2: Educational services provided within the Knowledge Sharing Cores

 

Each of the following technologies will have distance learning and training modules developed by university faculty.

 

·         The simple and classical Perl/ANSI-C programming languages

·         NLP++ (a mature natural language processing programming language)

·         Contiguous Connection Model (or CCM) for data localization

·         In-memory Referential Information Base (I-RIB) for data encoding

·         Differential Ontology with classical (Dumus and Landauer) latent semantic indexing and probabilistic latent semantic indexing

·         The use of the Ontology Lens to provide visual control over the automated production of taxonomy and controlled vocabulary from free form text

·         SchemaLogic Inc’s knowledge management paradigm to provide reconciliation of terminological usage across multiple communities of practice.

·         Simple form of XML (XML basics) plus Topic Maps

·         OWL (Ontology Web Language with first order predicate logics and HyTime entailment description)

·          

A system wide understanding cannot be developed unless there is a high quality distance learning system with certifications and perhaps university credit for participants. 


 

Note from Natural Sciences 

 

Computer science often does NOT account for what cognitive science calls an “epistemic gap” between the external world and the private world of introspection.  However, Topic Maps does make this distinction.  The distinction is made using the concepts of “computer addressable subject” and “non-computer addressable subject”.

 

The scientists at Ontology Stream Inc have maintained that this distinction is the same distinction made by bio-mathematician Robert Rosen in pointing out that formal mathematical (Hilbert mathematics) models of natural systems often confuse the formal model with the natural system. 

 

In engineering situations that is ok. 

 

However, social and psychological systems have elements of complexity that are not reducible to formal models.  The error made in assuming the formal system can in fact be equal to the natural system is called the “category error”.  This category error is not understood by mainstream computer science, and current generation information systems are not designed with social and psychological science in mind.

 

The first deployment of the Knowledge Sharing Core will be as part of a Power to the Edge exercise in knowledge sharing and the measurement of shared awareness. 

 

This proposal is being made to the Office of the Secretary of Defense, as a sole source un-solicited funding proposal.   The sole source claim is due to the endorsement and participation of a number of scholars and deeply innovative companies.

 

Use of the Cubicon Language: (Additional discussion requires non-disclosure agreement.)

 


 

Summary

 

As a general principle, the first Knowledge Sharing Core, licenses a specific group of patents and proprietary methodology.  This intellectual property is integrated into a robust and scaleable system having micro-transaction instrumentation within the software.  The instrumentation allows a full accounting of when and how the elements of the Knowledge Core are used.  A Knowledge Core is targeted in a specific domain, such as mapping terrorism activity, biodefense, or mapping the emergence of new trends in patent disclosure. 

 

As a specific Knowledge Core is developed, we assume that a few additional patents will be jointly filed, by the development team, so that the entire system as a whole has protection and cannot be copied without compensating the owners of integrated patent and other intellectual property.  This protection allows open disclosure.  Open disclosure is deemed essential and non-agreement on this issue will exclude participants would other wise be included.

 

The policy of open and clear disclosure will make each Knowledge Core system subject to competition from new products (developed in competition to OntologyStream Inc) as well as from new Knowledge Cores developed by OntologyStream Inc.  This evolutional pressure on the emergence of the knowledge technologies is what we are looking for. 

 


 

Appendix A:  Knowledge Sharing Core System Diagram

 

 

Figure 1 shows the classical separation of software into a

 

{ data layer, transaction layer, presentation layer }.

 

We feel that the presentation of Knowledge Cores can be always addressed using the topic map constructions, and that the Actionable Intelligence Process Model should always be used to place tools into proper context. 

 

 

 

Figure 1: One of the early Knowledge Sharing Core Diagrams

 

For example, shallow and deep linguistic parsing serve different purposes and both should be available to a user who is attempting to extract concepts from text.  The TAI Visual IDE for natural language parsing is designed for language specialists and can be used in any natural language setting and against any set of linguistic variations as supporting thematic analysis of social discourse (broadly considered).  NLP++ assists end-users in instrumenting the more obvious co-occurrence patterns and can be used in conjunction with the TAI Visual IDE to instrument very complex and yet highly precise real time pattern recognition capability.

 

The Semio (Entrieva Inc) semantic parser is a direct competitor to the NLP++ tagger.  However, the two systems work on radically different principles.  The task we have set our group to is to provide objective evaluation of the strengths and weaknesses of the two systems when compared in the social/cognitive science laboratory and as an integral part of military/intelligence exercises.  

 

A similar head to head competition exists between SAIC’s LSI and Recommind’s probabilistic LSI (PLSI).   The Ontology Lens (discovered and made public domain by Prueitt in 2002) and Differential Ontology (discovered and made public domain by Prueitt in 2002) works with either LSI or PLSI. 

 

Precise pattern recognition allows real time realignment of parsers and ontology services so that new and important linguistic variation is routed immediately to those who need to look for consequences relating to national security.  A simpler functionality is needed in responding to new patterns from ICD code analysis (syndromic surveillance) and in immediately viewing digital libraries (about medical science and about grid systems) from a new viewpoint.  A similar functionality is needed in mapping vulnerability and threats in trucking infrastructure and harbors. 

 

 

 

Figure 2:  Two application areas for the Knowledge Sharing Core

 

The Knowledge Core educational materials provide access to a liberal understanding of these patented tools, and of tools that are public domain. 

 

A presentation layer does need to be available to allow the direct use of tools in a native fashion, separated from the topic map presentation.  So we make a distinction between the technical presentation to the user, and the human computer interface that is designed for knowledge work not related to the limitations and capabilities of patented processes.

 

Topic maps is consistent with ontology encoded as XML.  What XML has difficulty with is speed and scalability issues related to schema and to how the data is encoded into both executable memory and system files.  This is where we use CCM, schema logic and ontology encoding based on Hilbert space mathematics.


 

Engineering support needed

 

Founder of Ontologystream Inc, Dr. Paul S Prueitt, will direct all aspects of the design, development and testing of the First Knowledge Sharing Core Virtual Private Network. 

 

First notes on First Knowledge Sharing Core Virtual Private Network will be kept public domain.  However, some work will be developed as potential new patents.  Additional patenting activities are sought and ARE NOT to be disallowed by contracting with the federal government.

Approximated engineering work that needs to be covered

28K      Settle the DOF issues                                  2 months

28K      CCM and I-RIB integration                           2 month (almost completed)

28K      CCM and Topic Maps work                          2 months (half completed)

42K      TextAnalysis IDE work                                 3 months ( 1 month completed)

42K      Logic over schema and CCM work                                   3 months

28k       Integration of Semio tagger                                          2 months

28k       Integration of Recommind PLSI                                    2 months

28k       Integration of SAIC’s LSI engine                                   2 months

56k       Use of Cubicon to express all processes in common language   4 months

Approximated testing and training material development

Development of distance learning curriculum on the selected Core technologies   12 months  (will contract with 6 different scholars)

 

 

Total effort                                                                                                         38 months

Total expenditures                                                                                            532 K


 

Ontology Stream Stock distribution

 

Founder of Ontologystream Inc, Dr. Paul S Prueitt, currently owns 100% of the OSI stock with 3% allocated to existing friends and family investments.

 

An agreement has been made to not diffuse the current issue of 100,000 shares except under an agreement by 65% of the stock ownership.  The reason 100,000 has been the amount of stock Ontology Stream Inc authorized is due to the cost of maintaining stock authorization, in yearly payments to the Commonwealth of Virginia, and the belief that it may be several years before any stock is really distributed.

 

Stock ownership entitles the owner to dividends only.  A Board of Directors looks after the interests of the stockholders.  Resale of stock is done only by approval of the Board. 

 

20% of the stock is currently available to a venture capital firm provided terms can be established that meet the approval of the Ontology Stream Inc Board of Directors. 

 

We will consider an investment from In-Q-Tel. 

 

25% of the stock is reserved for distribution to scholars and innovators who the Board recognizes as having made a significant contribution to the founding of Ontology Stream. 

 

The Founder, Paul Stephen Prueitt and his family reserve 30% of the stock.  This includes the 3% allocated to existing loans.

 

A reserve of 25% is made to exchange for services, stock, or cash from other corporations.  However, this reserve has the limitation that a single outside company can own no more that 5% of the stock. 

 

Our June 1st, 2003 valuation is set at 4,500,000.  Making each share worth $45.00.  Stock sales cannot occur except under Rule 504 of the Blue Sky laws. 

 

Back wages (as of June 1, 2003) are acknowledged to Paul Prueitt in the amount of $85,000 and to a third party for $25,000. 

 

Significant resources, 100 – 200K, should be expended to establish accounting and legal practice for Ontology Stream Inc.