Pathway work/Queries/Trif query use case
Albert - Feb/2010
As per the last skype discussion, I have been looking over this bit of the TLR4 pathway on spreadsheet lines 50 and 51:
| Process | Realizes | (Processual) part_of |
|---|---|---|
| rlps2-tram | change in location | rmend |
| rlps2-tram + trif -> rlps2-tt | tirdb of trif and tirdb of tram part of rlps2-tram | tirdbp that has location eendos |
It helps me to have all of the relevant links at hand, so here is what I am looking at:
- http://pfam.sanger.ac.uk/family/PF01582
- http://gowiki.tamu.edu/wiki/index.php/Category:GO:0071523_!_TIR_domain-mediated_complex_assembly
- http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006898
- http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0070976
- http://pir.georgetown.edu/cgi-bin/pro/entry_pro?id=000001749&retrieve.x=0&retrieve.y=0
- http://pir.georgetown.edu/cgi-bin/pro/entry_pro?id=000001750&retrieve.x=0&retrieve.y=0
The use case was to develop a query to find macromolecules other than trif that could bind to rlps2-tram in this step of the pathway. Presumably, this query would be looking for things similar to trif via ontological relationships established from a variety of sources. Several questions/comments:
(1) Am I right in saying macromolecule? Since the entity doing the job of trif could either be a protein OR a protein domain? This would impact the select clause of a sparql query.
Anna Maria: I am not sure, the protein domain would be always part of some protein, could not be by itself. I think we will query for protein interactions only.
Lindsay: do we need to design specific queries, one for protein domains, one for proteins, one for macromolecules, etc? or could we (is there benefit to) designing a general query that would work at all levels? in other words, if a macromolecule has a particular protein as part, and the protein has a particular protein domain as part, ...
(2) Would a more detailed examination of receptor-mediated endocytosis allow us to say something more specific than 'change in location' in the realizes column of line 50? Perhaps the location has to change to ‘eendos’? Was this more general description chosen because we don’t have an established ontological representation of locations (sites) of the cell? This would impact the from and where clauses of a sparql query.
Anna Maria: You are right. "Change in location" has been used only for highlight us that we where moving from one compartment to another since we don't have a good ontological representation of how things move in the cell. I used eendos in lane 51, I could add "peripheral cytoplasm CCO:C0000489" in lane 50.
Linsday: i think we do have an established representation of locations with the cell in the GO cellular component ontology. not all cellular components can serve as locations, but some can.
(3) [EXCITING] The realizes column of line 51 is what Lindsay and I have begun calling complementary dispositions, where two functions (dispositions) are realized by a single process. I think all binding processes might be formalized using complementary dispositions (and the reciprocal disposition partners that bear them), which is nice b/c it gives us a hook into looking for other things that might bind.
Lindsay: yes. i think if we can record complementary dispositions at the level of protein domains, then we can infer upwards to proteins and macromolecules. going the other way is harder (impossible?).
However the function tirdb (TIR domain binding) seems to be a bit uninformative… I don’t know if I am expressing this right, but I think to say that tir has_function tir domain binding is like saying Albert has_function AlbertLikeFunctions…yes, I have AlbertLikeFunctions, but what is important is the qualities that confer this function and the structure of realizations of these functions. So I think for any meaningful queries, we are going to need to have more information than the GO definition of tirdb: Interacting selectively and non-covalently with a Toll-Interleukin receptor (TIR) domain of a protein. The TIR domain is an intracellular 200 residue domain that is found in the Toll protein, the interleukin-1 receptor (IL-1R), and MyD88; it contains three highly-conserved regions, and mediates protein-protein interactions between the Toll-like receptors (TLRs) and signal-transduction components. [source: GOC:mah, InterPro:IPR000157]
Anna Maria: think there was a misunderstanding I did not say "Tir has_function TIR domain binding"but "TRIF has _function TIR domain binding" I think that one of the things that is missed from the definition I sent to GO,that could be important for the query, is which are the other domains which Tir can interact with. This is information are in PFAM. I am not sure if, we could directly linking to that to get the information, otherwise I was thinking to organized them.
Lindsay: I think it would be great to have this information. automating a way to pull it from pfam would be preferable i would imagine. we could manually curate it for a few to get a sense of what the problems are, but if we automate it then we have the hope that this will scale to large networks of proteins.
(4) This is just speculation on my part, but it seems that the 'more information' we may need for this is something that could be expressed in a biochemical ontology…or some other resource that burrows down into what gives TIR the functions it has (CheBI?)
Anna Maria: The TIR domain binding function of TRIF is given by its particular aminoacid sequence in a specific region(tir domain)
Lindsay: this would indeed be very useful. this would allow us to make predictions about which SNPs are likely to impact function.
(5) Please note, if SPARQL queries do nothing more than bring these disparate pieces of information together, this would still be a very useful tool..b/c it would enable someone like me who has no training in biology to find all of the relevant resources, defs, and references quickly.
Lindsay: i think this might be part of what neurocommons does. they may not at this stage include the types of information we are working with, but i think their overall goal is to bring disparate pieces of information together.
However, (even wilder speculation) since we are going for the holier grail of identifying trif-like proteins, maybe we should start discussing rule-based models of how and where trif-like-things bind…short of doing a whole ontology for each macromolecule. Lindsay would your Prolog rule-based microenvironment model be suitable for filling in/inferring the extra bit of information needed for this part of the query? Alan, would a port of a prolog model into a rule language like SWRL play nice with SPARQL queries?
Lindsay: i think not for two reasons. one being: i didn't (and don't) know anything about prolog, so i doubt what i made was useful for anything other than me thinking outloud. the other reason is, the rules i was trying to write were intended to capture the consequences of cells responding to particular signals in their microenvironment. i was represening things like
microenvironment X -> cell behavior Y
the events on the inside of the cell were not specified. ultimately, that is what we want though, we want to make predictions about what a cell will do when presented with particular sets of signals, so i think it would be great if we could link the pathway stuff to the cellular level stuff. i think my prolog stuff might be useful for helping me show you what i was trying to do, but i don't think we will be able to actually use it for anything.
I hope none of the above reveals too much disorganization of thought on my part or too much of a tendency towards an everything-and-the-kitchen-sink approach to this use case, but I think it would be nice to have some information for queries flowing from other ontologies, some from logical inferences, and some from on-the-fly computations using traditional approaches (DiffEq models of protein-protein interactions). Feel free to quell my speculation though
Anna Maria: I am writing as ignorant of SPARQL and rule-based models , but from my biological point of view I think that "trif -like thing" should be find out by the fact that they have the SAME TIR BINDING DOMAIN FUNCTION and that have the possibility to be in the SAME LOCATION of TRIF. This would be a huge information for biology. May be we should organize another call to discuss it.

