[29]                               home                          [31]

 4/22/2004 6:53 AM

 

Key questions on Common Upper Ontology

 

 

Communication from Paul Prueitt

 

 

The concepts of controlled vocabulary, formative ontology, emergent ontology, mutual induction and differential ontology are relevant to the discussion between Dean and John.

 

[27] [28] [29]

 

Breanna Anderson, co-founder of SchemaLogic Inc, has the concept of controlled vocabulary developed so that two things occur.

 

1)       the controlled vocabulary is subject to continual re-standardization through a specific and visible language usage methodology re-enforced with training and software.  The purpose of the re-standardization is to identify when new terms are being used in natural discourse (such as in weblogs) and when there are disputes or variation of contextual settings that give single terms or phrases (or patterns of linguistic variation) variations in interpreted meanings by individuals and within communities of practice.

2)       The controlled vocabulary engages in computational machinery so that the benefit of having a “weak” conceptual representation of subject matter, as graphs with nodes and links between nodes, can be achieved.

 

These differences makes the creation of taxonomy, controlled vocabulary and ontology have two faces.  One face is towards the machinery, where key words and patterns of linguistic variation provide back of the book indexing and reconciliation of differences over meaning. 

 

The other face is towards a natural social community that needs a commonality of word usage to ground distributed conversations and work product.  In the context of discussions within complex business setting, the word-usage schema-control and the important logic over schemas technology will measure and mediate the natural discussion.  The control has to be “self-control” not something imposed by a standards committee.  Certainly if the IT standards committee is agnostic to the social and psychological sciences, then the committee’s need to be involved will simple harm the productivity of the social discourse.  

 

Side Note: As John Sowa describes, super hype that is not grounded in reality has sometimes been used to sell Semantic Web buzzwords.  If this occurs in the context of creating artificial control over social discourse then the legitimate research and development is not going to be funded.  I share John’s concern.

 

Text mining and natural language processing technology is then put into place to provide an external measure of the usage of terms, phrases and pattern of co-occurrence.  WordNet, linguistic services and ontology services can all be used here in reasonable ways.

 

Natural language use within communities of practice have natural reinforcement of anticipatory mechanism – both as part of individual experience of knowledge and social generation and use of language.

 

When there are variances in how individuals want to use language, and how the controlled vocabulary (and auxiliary resources such as machine represented ontology and schema structure) is defined, there is a culture issue.   But when there is a culture issue, then specific processes are used to adjust the regularity of the controlled vocabulary at precisely those points where reconciliation is needed.

 

The vocabulary usages becomes anticipated and used (as if a natural language).  To be useful and not to be inhibitory, the system of knowledge representation is left open.  Formative and differential ontology construction can be instrumented to the measurement of linguistic variation being used by the community in real time, then a process of introduction of new terms, phrases and patterns is found to be useful. 

 

I do not know the detail of Diana McGuinness’s consulting work.  The limitations of fixed vocabularies is well known to her, and I would expect that the method Dean described and attributed to her will weaken the “control” of human social discourse while providing a machine computable ground. 

 

As Dean said:

 

In the situations in which you can gain some form of consensus about a common vocabulary, and you can build this into a "starter ontology" that others extend (by adding new terms and classes), you reap many gains; interoperability becomes (relatively) easy, because you can map terms to themselves.   Knowledge acquisition and gathering becomes (relatively) easy, because you have a set of terms that people can use to describe what they know/want to express.  These things make for a very attractive environment.

 

Here Diana is not concerned about computer format issues.  She is addressing directly the interface between human social discourse and computational forms that map to Protégé or Cyc or OWL or Topic Maps.

 

The starter ontology can be a core set of graph representations of some set of concepts and some part of the relationships between concepts that are known.

 

The controlled vocabulary is then mapped, perhaps even using an associative memory (as in artificial neural network associative memory) to provide a weak and flexible guess at concept representation in the ontology when human language is being used in its normal expressive forms.

 

I developed (late last year) architecture for Fixed Upper Taxonomy to serve as an external automated back-of-the-book indexical for the FCC.  This project ran into the classical cultural resistance to solving problems that would make internal work by government agencies transparent (to themselves as well as to the public). 

 

The premature closure of this work was due simply to the control of the internal discussion over taxonomy and subject indicators by IT folks who neither have a clue what the linguistics issues are, and do not care.  In most cases, software vendors are in business to sell software, not any software but the software that they own. 

 

The problem is that the software vendor’s private interests are allowed to dominate the public interests in an environment where confusion over precisely the issue of controlled vocabulary is maintained.  Making it clear that there are alternatives, which is what Breanna Anderson has done, and which is what Debbie McGuinness is doing; makes no difference if the buzz words are used as swords to prevent socially needed change from occurring.  Precisely at this point one needs to talk about Nash economics and business practices.

 

In summary: The controlled vocabulary can be weakly controlled, as it is with the SchemaLogic software.  The control has to be organic and originate from within the social discourse – not the needs of standard’s committees.