Knowledge
Sharing Core
Version: 8/23/2003
Links
within this document
·
Overview
·
Use of Patents and Software Licenses
·
Note from the Natural Sciences
·
Summary
·
Appendix – A Knowledge Sharing Core
System Diagram
·
Appendix B –
Engineering support needed
·
Ontology Stream Inc Stock
Distribution
A teaming
agreement is being negotiated in public view.
As a team, we propose to use selected patented intellectual
property and proprietary methodologies, as appropriate, from the following
companies:
SAIC will
be asked to provide management support and a license for Latent Semantic
Indexing
Applied Technical
Systems Inc – will be asked to
supply Continuous Connection Model data structure and NdCore conceptual rollup
technology
Text Analysis
International Inc -
has agreed to supply rule formation and deep parsing capabilities, plus an
existing TAI web crawler for finding and acquiring new information from web
sites.
Schema Logic Inc – will be asked to supply a SchemaServer logic between viewpoints or schema. SchemaServer uses a proprietary methodology to assist in reconciliation of terminological usage of multiple controlled vocabularies. However, SchemaLogic will be integrated using a team employed by OntologyStream, since SchemaLogic’s business model precludes investment in a relationship with DARPA or OSD. A SchemaServer will be deployed on a OntologyStream server (dedicated computer) for 24 months. A dedicated knowledge engineer/ knowledge management engineer will be employed by OntologyStream to use and develop knowledge artifacts based on a principled use of the SchemaServer. The SchemaServer will NOT be integrated into the Knowledge Sharing Core but will be an external resource to the Knowledge sharing core.
Entrieva Inc
– will be asked to supply a competitive alternative to the NdCore conceptual
rollup supplied by ATS to allow benchmarking and comparison between radically
different approaches to conceptual rollup (automated ontology construction)
ClearForest Inc
– will be asked to supply a user centric rule-making
interface for harvesting data form the web.
Recommind Inc
has a probabilistic Latent Semantic Indexing that will clearly compete well
with SAIC’s LSI (based on linear algebra).
OntologyStream Inc
will provide consulting from a group of 12 scientists,
including Dan Levine, Karl Pribram, Peter Kugler, Robert Shaw, Stephen
Grossberg, Steven Newcomb, Art Murray, John Sowa, Lofti Zadeh, Walter Freeman,
Raymond Bradley, and Steven Kercel. OntologyStream will also provide scientific
oversight and architecture for the development and testing of the First
Knowledge Sharing Core Virtual Private Network and capabilities.
CoreTalk Inc
will provide consulting and design services using the
Cubicon design language.
Each of these companies has proven capability. Our concept is to transfer lessons learned from military intelligence and procurement practices to the procurement of transformational technology, biodefense technology, the analysis of social discourse, and then to infrastructure protection intelligence (trucking and harbors) and finally to business and personal intelligence systems.
Personal intelligence systems (for public use) are to be integrated into communities based game-type environments having distance learning and scholarship archives. These companies, and others as resources permit, will supply software licenses to OntologyStream Inc. Ontology Stream Inc will manage emerging vertical markets that exploit these software licenses.
In the history of text understanding systems research, what has always been critically needed is relevant data. Most within our group judge the TREC, Message Understanding Conferences (MUC) and TIPSTER projects conducted by DARPA, CIA and NIST have being deeply flawed by measures of precision-recall that are deeply biased towards statistical methods.
However a Special Open Source data set is available for the analysis of Islamic social discourse. A number of Islamic scholars are willing to participate in a project to provide a conversion of this data source into an archive that has differential ontology and is focused on understanding what might be gleaned from the open source analysis of themes in social discourse.
Web harvests from Islamic social groups in 14 countries have been used in a project funded by NSC and called “J-39”. An eight-month archive of web harvest has been developed at INSCOM. This Open Source dataset is archived in an Oracle database at Object Sciences Inc (Alexandria Virginia).
The current analysis system developed and deployed by SAIC and Object Sciences Inc at INSCOM, is judged to be insufficient to provide a measured fidelity between thematic analysis and Islamic memetic expression since June 2002. The data set exists and can be reprocessed and a true thematic analysis of Islamic public/social discourse stood up as a poll type information system.
Fable arithmetic: A long-standing project, to develop “fable arithmetic” has demonstrated the integrated working of text analysis on small text corpus using NLP++, CCM/I-RIB knowledge base, differential ontology and logics on schema. We look for the trial deployments to now occur in the Fourth Quarter of this year.
Small collection of patent abstracts: The BCNGroup is a not for profit corporation registered since 2000 in the Commonwealth of Virginia. This small collection of patent abstracts represents a very difficult test for text understanding systems. This not-for-profit corporation has developed the goal of mapping all patent disclosures related to the knowledge science.
The notion of a Knowledge Sharing Foundation is that patents protect data mining, natural language processing, and decision support capability derived from linguistic-mathematical models. Because of this protection, the full nature of the processes can be revealed in public and subjected to extended scholarly reviews.
Information
Production Systems: Selected vendors and differential
ontology has been judged to be sufficient to a design, demonstrate and deploy
new and powerful paradigms that use machine-readable ontology to produce new
information. And the re-design and
integration process using the Cubicon iconic language will lead to additional
patent applications made jointly by OntologyStream and the participating
vendors.
The
Knowledge Sharing
Foundation concept produces
a use-driven evaluation of any intelligence, or more generally any information
production, capability. This evaluation
is defined as part of the Actionable Intelligence Process Model (see Figure 1). Network Centric Warfare
regards the use of the AIPM as a
conversion of an information push paradigm to the information post
and pull paradigm. To achieve this
capability, we will need transaction instrumentation and analytic reporting
mechanisms so the use metrics drive the selection of capabilities added into
the Knowledge Sharing Foundation.
One
purpose of public disclosure of underlying technology is to provide a liberal
understanding, to end users, of the capabilities of this technology as an information
production system. Thus users in the
military, intelligence and medical professional communities will have
reasonable access to a liberal
understanding, from leading university professors,
about what are and what are NOT possible with these patented processes.
Figure 2: Educational services provided within the Knowledge Sharing Cores
Each
of the following technologies will have distance learning and training modules
developed by university faculty.
· The simple and classical Perl/ANSI-C programming languages
· NLP++ (a mature natural language processing programming language)
· Contiguous Connection Model (or CCM) for data localization
· In-memory Referential Information Base (I-RIB) for data encoding
· Differential Ontology with classical (Dumus and Landauer) latent semantic indexing and probabilistic latent semantic indexing
· The use of the Ontology Lens to provide visual control over the automated production of taxonomy and controlled vocabulary from free form text
· SchemaLogic Inc’s knowledge management paradigm to provide reconciliation of terminological usage across multiple communities of practice.
· Simple form of XML (XML basics) plus Topic Maps
· OWL (Ontology Web Language with first order predicate logics and HyTime entailment description)
·
A system wide understanding cannot be developed unless there is a high quality distance learning system with certifications and perhaps university credit for participants.
Computer science
often does NOT account for what cognitive science calls an “epistemic gap”
between the external world and the private world of introspection. However, Topic Maps does make this
distinction. The distinction is made
using the concepts of “computer addressable subject” and “non-computer
addressable subject”.
The scientists at Ontology Stream Inc have
maintained that this distinction is the same distinction made by
bio-mathematician Robert Rosen in pointing out that formal mathematical
(Hilbert mathematics) models of natural systems often confuse the formal model
with the natural system.
In engineering situations that is ok.
However,
social and psychological systems have elements of complexity that are not
reducible to formal models. The error made
in assuming the formal system can in fact be equal to the natural system is
called the “category error”. This
category error is not understood by mainstream computer science, and current
generation information systems are not designed with social and psychological
science in mind.
The first deployment of the Knowledge Sharing Core will be as part of a Power to the Edge exercise in knowledge sharing and the measurement of shared awareness.
This proposal is being made to the Office of the Secretary of Defense, as a sole source un-solicited funding proposal. The sole source claim is due to the endorsement and participation of a number of scholars and deeply innovative companies.
Use
of the Cubicon Language: (Additional discussion requires non-disclosure agreement.)
As
a general principle, the first Knowledge Sharing Core, licenses a specific
group of patents and proprietary methodology.
This intellectual property is integrated into a robust and scaleable
system having micro-transaction instrumentation within the software. The instrumentation allows a full accounting
of when and how the elements of the Knowledge Core are used. A Knowledge Core is targeted in a specific
domain, such as mapping terrorism activity, biodefense, or mapping the
emergence of new trends in patent disclosure.
As
a specific Knowledge Core is developed, we assume that a few additional patents
will be jointly filed, by the development team, so that the entire system as a
whole has protection and cannot be copied without compensating the owners of
integrated patent and other intellectual property. This protection allows open disclosure. Open disclosure is deemed essential and non-agreement on this
issue will exclude participants would other wise be included.
The
policy of open and clear disclosure will make each Knowledge Core system
subject to competition from new products (developed in competition to
OntologyStream Inc) as well as from new Knowledge Cores developed by
OntologyStream Inc. This evolutional
pressure on the emergence of the knowledge technologies is what we are looking
for.
Figure 1 shows the classical separation of software into a
{ data layer, transaction
layer, presentation layer }.
We feel that the presentation of Knowledge Cores can
be always addressed using the topic map constructions, and that the Actionable
Intelligence Process Model should always be used to
place tools into proper context.
Figure 1: One of the early Knowledge Sharing Core Diagrams
For example, shallow and deep linguistic parsing
serve different purposes and both should be available to a user who is
attempting to extract concepts from text.
The TAI Visual IDE for natural language parsing is designed for
language specialists and can be used in any natural language setting and
against any set of linguistic variations as supporting thematic analysis of
social discourse (broadly considered). NLP++ assists
end-users in instrumenting the more obvious co-occurrence patterns and can be
used in conjunction with the TAI Visual IDE to instrument very complex and yet
highly precise real time pattern recognition capability.
The Semio (Entrieva Inc) semantic parser is a direct
competitor to the NLP++ tagger.
However, the two systems work on radically different principles. The task we have set our group to is to
provide objective evaluation of the strengths and weaknesses of the two systems
when compared in the social/cognitive science laboratory and as an integral
part of military/intelligence exercises.
A similar head to head competition exists between
SAIC’s LSI and Recommind’s probabilistic LSI (PLSI). The Ontology Lens
(discovered and made public domain by Prueitt in 2002) and Differential Ontology
(discovered and made public domain by Prueitt in 2002) works with either LSI or
PLSI.
Precise pattern recognition allows real time realignment of parsers and ontology services so that new and important linguistic variation is routed immediately to those who need to look for consequences relating to national security. A simpler functionality is needed in responding to new patterns from ICD code analysis (syndromic surveillance) and in immediately viewing digital libraries (about medical science and about grid systems) from a new viewpoint. A similar functionality is needed in mapping vulnerability and threats in trucking infrastructure and harbors.
Figure 2: Two application areas for the Knowledge Sharing Core
The Knowledge Core educational materials provide access to a liberal understanding of these patented tools, and of tools that are public domain.
A presentation layer does need to be available to
allow the direct use of tools in a native fashion, separated from the topic map
presentation. So we make a distinction
between the technical presentation to the user, and the human computer
interface that is designed for knowledge work not related to the limitations
and capabilities of patented processes.
Topic maps is consistent with ontology encoded as
XML. What XML has difficulty with is
speed and scalability issues related to schema and to how the data is encoded
into both executable memory and system files.
This is where we use CCM, schema logic and ontology encoding based on Hilbert space
mathematics.
Founder
of Ontologystream
Inc, Dr. Paul S Prueitt, will direct all aspects of
the design, development and testing of the First
Knowledge Sharing Core Virtual Private Network.
First notes on First Knowledge Sharing Core Virtual Private Network will be kept public domain. However, some work will be developed as potential new patents. Additional patenting activities are sought and ARE NOT to be disallowed by contracting with the federal government.
28K Settle the DOF issues 2 months
28K CCM and I-RIB integration 2 month (almost completed)
28K CCM and Topic Maps work 2 months (half completed)
42K TextAnalysis IDE work 3 months ( 1 month completed)
42K Logic over schema and CCM work 3 months
28k Integration of Semio tagger 2 months
28k Integration of Recommind PLSI 2 months
28k Integration of SAIC’s LSI engine 2 months
56k Use of Cubicon to express all processes in common language 4 months
Development of distance learning curriculum on the selected Core technologies 12 months (will contract with 6 different scholars)
Total effort 38 months
Total expenditures 532 K
Founder
of Ontologystream
Inc, Dr. Paul S Prueitt, currently owns 100% of the
OSI stock with 3% allocated to existing friends and family investments.
An
agreement has been made to not diffuse the current issue of 100,000 shares
except under an agreement by 65% of the stock ownership. The reason 100,000 has been the amount of
stock Ontology Stream Inc authorized is due to the cost of maintaining stock
authorization, in yearly payments to the Commonwealth of Virginia, and the
belief that it may be several years before any stock is really distributed.
Stock
ownership entitles the owner to dividends only. A Board of Directors looks after the interests of the
stockholders. Resale of stock is done
only by approval of the Board.
20%
of the stock is currently available to a venture capital firm provided terms
can be established that meet the approval of the Ontology Stream Inc Board of
Directors.
We
will consider an investment from In-Q-Tel.
25%
of the stock is reserved for distribution to scholars and innovators who the
Board recognizes as having made a significant contribution to the founding of
Ontology Stream.
The
Founder, Paul Stephen Prueitt and his family reserve 30% of the stock. This includes the 3% allocated to existing
loans.
A
reserve of 25% is made to exchange for services, stock, or cash from other
corporations. However, this reserve has
the limitation that a single outside company can own no more that 5% of the
stock.
Our
June 1st, 2003 valuation
is set at
4,500,000. Making each share worth
$45.00. Stock sales cannot occur except
under Rule 504 of the Blue Sky laws.
Back
wages (as of June 1, 2003) are acknowledged to Paul Prueitt in the amount of
$85,000 and to a third party for $25,000.
Significant
resources, 100 – 200K, should be expended to establish accounting and legal
practice for Ontology Stream Inc.