[127]                             home                             [129]

 

Saturday, January 07, 2006

 

Challenge problem à

 

New discussion about signal pathways

and complex ontology

 

[350] ß  parallel discussion in “national debate” bead thread

 

This is part of a discussion that will be moved to a Wiki page soon.

 

 

Ontological modeling with category theory and convolutions

 

 

[347] ß Judith Rosen’s recent communication

 

Alan,

 

Two things may be quite different and yet the same, so how the real world "forms" a natural category is a subject of investigation.

 

For example, are two hydrogen atoms the "same" thing?   What about a protein filament that has metastable states corresponding to conformational structure (the way the protein is folded in 3 space).

 

The way that some parts of the ontological modeling community talk about this is with the words "interpretation" and "viewpoint".

 

When one uses any controlled vocabulary (a controlled vocabulary is a set of identifiers) there are interpretations and viewpoints hard wired, unless a mechanism is available to escape from the enforced meaning. 

 

So the meaning of a protein ID in a controlled vocabulary might be hard wired to the beta rather than an alpha metastable state of that collection of atoms.  (I am just making up language here.)  Suppose that a physical process is used to measure the atomic composition of proteins and can represent this measurement as a vector, or "vector profile" consisting of a 1 if an atom type is present and a 0 if not.  This happens when microarray measurement of proteins in a sample is made. 

 

So the vector is placed into a database with annotations. 

 

The problem here is that the two metastable states, alpha and beta, are both measured, by the physical measuring device, as the "same thing", and the assignment of the ID can enforce the view that the alpha state is always the same as the beta state.  The co-occurrence of proteins in the environment is not always complete, and this latent pattern analysis leads to further categorization errors in the original data. 

 

“Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering”,  Chang et all, in Genomics & Informatics Vol 1(1) 32-38, Sept 2003 is a clear example of a huge research literature on latent pattern analysis. 

 

 

Alan, you mentioned in your recent note that OWL has a retrieval (OWL inference) that uses shared types, shared properties... and flexible combinations.  Have you seen this OWL specification work in more than one instance?  Do you know “HOW” this spec works or will work in theory?

 

you state:

 

"There are other ways of establishing

correspondences between things in the SW, e.g. shared types, shared

properties, and relatively flexible combinations of such defined in the

OWL spec.

 

A case where the identification will be more difficult in our domain

will be when talking about complexes. There are not as many databases

of complexes as there are of small molecules, and the the nature of

identity is not as well established. Nonetheless we can build the

foundations of attacking this problem (of identifying complexes) by

taking advantage of methods of identifying proteins uniquely."

 

<end quote>

 

I ask this because of course given a graph structure with nodes and links associated with property types, slots types, filler types, etc...  one has a mathematical framework for what I call "convolutional transforms".

 

I think the language "convolutional transforms" is not used often but I do see it.  The convolutional operator is like normal calculus integration.... one looks at each element in the set of concepts, or set of relationships, or set of whatever.  QAT and QSAR can be supported by fast convolution transforms. 

 

As one looks at each element of a set, a small rule base is made the object of convolution.  Several of the intelligence community vendors have made the convolution over rule base (a set of rules) very fast using silicon.

 

What is significant is that this "double convolutional process" is like integrating over two variables... BUT where at each step the sets may be allowed to conditional change.  This fact leads into Richard Ballard’s information theory (extending Shannon) to conjecture about informational invariance in n-ary space. 

 

Example:

 

If the "element from the set over which the convolution is occurring is "protein x" then look to see if any proteins in the same sample are one of those in the q category of proteins.

 

When finished with the rule set for "that" element, then move the program pointer to the next element.

 

(Of course "ordering" is key, as well as computer encoding and access.. (this is why I use the Orb (Ontology referential base) formalism...

 

http://www.bcngroup.org/area2/KSF/Notation/notation.htm

 

The Orb formalism and its native encoding may be the optimal means of managing controlled vocabularies.

 

Orb is a "key-less" hash table (what ever that might mean).  (smiles)

 

****

 

 

I am not writing all this to promote my work as my work is far from being usable...  but I am laying out the issues that I see are general issues that you address in the quote I state above.  You have put faith that something theoretically possible in OWL might be delivered in a useable form scalable to very large bioinformatics database (after integration). 

 

I see the category theory is being almost entirely missing in the OWL technology and discussions between W3C activists.  I also see the over use of the term "inference" when all that is being suggested in a type of (interesting) complicated retrieval.

 

The retrieval's complicated nature has a tendency to "sit on its pointers" in actual deployments, and thus the intractability of even non-complex complicated query (OWL inference).

 

Terminology reconciliation may need to be left completely open until "compute time", and thus a second problem is occurring in the fixed meaning for each URI and each namespace.

 

These URIs (within the context of a namespace) make be structurally incorrect to manage active scientific investigations of cell and gene expression (or social/business expression).

 

The issue is the situational nature of any human communicative act would create the need for a new namespace...  leading to intractability problems if one wants to fully turn the investigation over to an artificial intelligence.

 

I will stop here, and most see how my paradigm is defined.

 

 

Dr Paul Prueitt (mathematics and quantum cognitive neuroscience)

The Taos Research Institute

Taos New Mexico