[122]                             home                             [124]

 

Saturday, January 07, 2006

 

Challenge problem à

 

New discussion about signal pathways

and complex ontology

 

 

Ballard’s discussion about n-aries à moved to [125]

 

 

This is part of a discussion that will be moved to a Wiki page soon.

 

 

From

 

http://biopaxwiki.org/cgi-bin/moin.cgi/StatesProposal#head-0e1cbc9b09f146ec17421371ad7e1a8a90aa6084

 

User Requirements

We need to be able to:

1.        Identify identical molecules/states. For instance, we would like to know that an existing database, that has defined a physicalEntityInState (e.g. phosphorylated p53 in the cytosol) can exchange this with another database. Note that the identity can not be absolutely defined as it can change depending on the level of granularity of the model. It is a conclusion by the curator of the model. Although one can attempt to calculate identity without states, it is not straightforward, convenient, and might not always be feasible. Requested by PATIKA,cPATH,Cytoscape,PVS.

2.        Represent state variables formally, and identify identical variables. This requires minimally listing all possible modification types and associate a controlled vocabulary term with each one. Obviously a sub-ontology is also a possible. We consider the following variables:

a.        Cellular Location: Basically cellular compartments but also can be extended to special points in the cells like axon hillock, or even chromatin. Location can also have an attachment aspect, a cytopasmic protein can be attached to a membrane.

b.        Chemical Modification: This is by far the most diverse and common variable. PTMs, RNA splicing, DNA methylation all fall into this class.

c.         Conformational Change: All other structural changes within the molecule like conformational changes in Protein or DNA.

d.        Complex Member: We want to address complexes and their members individually.

Requested by PATIKA,cPATH,Cytoscape,PVS.

Data Model Requirements

1. Track different molecules with the same path of synthesis. This is already provided by BioPAX, however any improvement should not break this requirement. 1. Minimize data duplication when defining molecules/states. Duplicated data is considered bad for a knowledge representation system, as the modifications are harder to make, and it is easier to intorduce inconsistencies. Requested by PATIKA. 1. Specify which variables are possible, given a type of molecule. It has two aspects:

a.        Only a small subset of actual state variable space is possible for physical entities, for example only specific sites on proteins can receive specific chemical modifications. Although technically it is possible to specify all possible state variables at the physical entity level, and then let states be defined only in this variable space, practically this data does not exist for all proteins. We can still adopt an "add on the go" approach though. Requested by PATIKA.

b.        Some variables can only be applied to certain types of entities. For example phosphorylation is only valid for proteins. Requested by PATIKA.

2. Quickly identify identical molecules/states.