Saturday, January 07, 2006
New
discussion about signal pathways
and complex ontology
Ballard’s
discussion about n-aries à moved to [125]
This is part of a discussion that will be moved to a Wiki page soon.
From
http://biopaxwiki.org/cgi-bin/moin.cgi/StatesProposal#head-0e1cbc9b09f146ec17421371ad7e1a8a90aa6084
We need to be able to:
1.
Identify identical molecules/states. For instance, we
would like to know that an existing database, that has defined a
physicalEntityInState (e.g. phosphorylated p53 in the cytosol) can exchange
this with another database. Note that the identity can not be absolutely
defined as it can change depending on the level of granularity of the model. It
is a conclusion by the curator of the model. Although one can attempt to
calculate identity without states, it is not straightforward, convenient, and
might not always be feasible. Requested by PATIKA,cPATH,Cytoscape,PVS.
2.
Represent state variables formally, and identify
identical variables. This requires minimally listing all possible
modification types and associate a controlled vocabulary term with each one.
Obviously a sub-ontology is also a possible. We consider the following
variables:
a.
Cellular Location: Basically cellular compartments but
also can be extended to special points in the cells like axon hillock, or even
chromatin. Location can also have an attachment aspect, a cytopasmic protein
can be attached to a membrane.
b.
Chemical Modification: This is by far the most diverse
and common variable. PTMs, RNA splicing, DNA methylation all fall into this class.
c.
Conformational Change: All other structural changes
within the molecule like conformational changes in Protein or DNA.
d.
Complex Member: We want to address complexes and their
members individually.
Requested by PATIKA,cPATH,Cytoscape,PVS.
1. Track different molecules with the same
path of synthesis. This is already provided by BioPAX, however any improvement
should not break this requirement. 1. Minimize data duplication when defining
molecules/states. Duplicated data is considered bad for a knowledge
representation system, as the modifications are harder to make, and it is
easier to intorduce inconsistencies. Requested by PATIKA. 1. Specify which
variables are possible, given a type of molecule. It has two aspects:
a.
Only a small subset of actual state variable space is
possible for physical entities, for example only specific sites on proteins can
receive specific chemical modifications. Although technically it is possible to
specify all possible state variables at the physical entity level, and then let
states be defined only in this variable space, practically this data does not
exist for all proteins. We can still adopt an "add on the go"
approach though. Requested by PATIKA.
b.
Some variables can only be applied to certain types of
entities. For example phosphorylation is only valid for proteins. Requested by
PATIKA.
2. Quickly identify
identical molecules/states.