Rationale behind protein shape prediction projects

Author	Message
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 10468 - Posted: 4 Feb 2006, 20:42:03 UTC About the prediction structures, could you comment on the issues raised in the Is HPF (using Rosetta) project totaly worthless? thread at UD's forums: TestPilot: Just wondering. There is a PDB database out there. It contains information about 3D structures of different proteins. So far it accumulated info about >30 000 proteins. In human DNA it is believed to be around 30 000 - 40 000 genes, so it is 30 000 - 40 000 structures we need to know. Currently there more then 5 000 structures deposited a year, and that number grows each year substantially. Check statistic of that database. Even if only half of structures in PDB database belong to human domain, we will know almost all 3D structures in few years from now, and most important ones must be already in there. And those data way more accurate than results of this project. Furthermore, if someone would need structure that not in PDB, he most likely will use newer version of Rosetta. Which should produce more accurate results - Rosetta is under development and quality of predictions grows (from CASP to CASP at least Wink ). So one way or another results of this project would be obsolete and outdated in 2-4 year time. What the point of project? Or, am I missing something? PDB -Protein Data Bank -http://www.rcsb.org/pdb/ Further debate: "They (PDB) stopped taking mathematical models(MM) in their database like 10 years ago, and they deleted those models from database. But they still count MM on stat page - anyway there was not that many MM submitted." That's correct, didn't notice that on the first visit. But still there must be many of the 33000 known proteins solely as MM's counted (just sum up the totally transmitted files per year.) A second point is, that database holds ALL sorts of proteins not only human. "Nope. After real shape of protein was determined, calculated shape would be useless." Again untrue, because of two things: 1) The true shape is determined from crystalline proteins. This shape might differ from its usual state in the liquid solution in the human body, which can only be calculated as far as I know. 2) When you check the deviation of the calculated model you can improve the software for the calculation, thus giving you better more correct models, thus understanding more the kinetics and other effects of protein folding. "To understand that you need to understand how protein folding prediction works. Basically Rosetta(or any other protein folding prediction software) generate thousands (millions, billions - depends of available computer power) possible protein 3D structures. After that they apply "measurement function" to that shapes. The important part of that measurement function is how stable particular structure is. The structures with better score are marked as prediction." As far as I understand how Rosetta works, it uses known kinetic and energetic models on an atomistic scale (or in the real case on an approximative way for whole aminoic acid functional groups). After that an initial "energy field" is created. Then the protein is folded a small amount in all possible directions and the difference of the initial and the resulting "energy field" due to the interactions of the various functional groups of the protein is calculated. The best solution(s) is taken as a new initial field and the process repeats. At some point no improvement over the "overall Gibb's energy" is possible, it is then minimized. This structure(s) is then taken as the calculated protein structure. At least that is classical ab initio approach on structure calculations. Most certainly does Rosetta make some assumptions and makes some things easier, which then is partly corrected by the statistical approach of the project. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 10468 · Rating: 0 · rate: / Reply Quote

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 10685 - Posted: 12 Feb 2006, 2:27:50 UTC So, would it be correct to say that in the short-term (a couple of years), Rosetta software will be useful, as applied via projects like HPF1/2 to determine 3D structures of existing proteins not yet in PDB. But, once the task of determining 3D structures experimentally (Xray, NMR) for most proteins existing in nature (i.e. all except those too big / difficult to study experimentally via NMR) is finished, then Rosetta would basically be a tool for designing "artificial" proteins? as per DARPA project Protein Design Processes: Today what is considered protein design is in reality the redesign of an existing protein. The Protein Design Processes (PDP) Program changes the paradigm by beginning with an understanding of the binding and chemical reaction that is to be expressed; designing an active site that is compatible with the initial, transition, and final state chemistry; and then embedding the resulting structure in a scaffold. To accomplish this, DARPA is investing in the development of new tools in diverse areas such as topology, optimization, the calculation of ab initio potentials, synthetic chemistry, and informatics leading to the ability to design proteins to order. At the end of this program, researchers expect to be able to design a new complex protein, within 24 hours, that will inactivate a pathogenic organism. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 10685 · Rating: 0 · rate: / Reply Quote

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 10792 - Posted: 15 Feb 2006, 22:27:57 UTC Last modified: 15 Feb 2006, 22:34:16 UTC Bump :-) Does anyone know the answer to the question in the previous post? i.e. what happens once all (or almost all) proteins 3D structures are in the PDB? Once all proteins' shape has been solved experimentally, then there will be no need for projects like HPF, right now using Rosetta sw to determine shape mathematically. So at that point in time, the R sw will mostly (only?) be used to design new proteins, right? Any estimate how long it will take for all proteins to get into PDB experimentally? Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 10792 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 10804 - Posted: 16 Feb 2006, 8:25:40 UTC - in response to Message 10800. Last modified: 16 Feb 2006, 8:37:17 UTC So at that point in time, the R sw will mostly (only?) be used to design new proteins, right? The "Research Overview" page also mentions "protein-protein interactions" as one of Rosetta's capabilities which could be put to good use once the PDB has been filled. Assuming that there are something like 10^5 human proteins there would be roughly 10^10/2 potentially interacting protein pairs, not to mention protein-DNA interactions and interactions with other non-protein molecules. Well, I am not sure how many of those 10^10 potentially interacting pairs, based on their shapes and chemical properties actually do interact but generally speaking, I believe the time when scientists will be out of work because nature (or even just biology - which largely consists of protein interactions) has been "finished" won't come any time soon. ;-) ...and thanks again to Vanita for her Science FAQ, which I just came across today (I usually only go to the homepage to look for news updates) ! ID: 10804 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 11372 - Posted: 25 Feb 2006, 6:15:02 UTC - in response to Message 10804. Last modified: 25 Feb 2006, 6:16:41 UTC ID: 11372 · Rating: 0 · rate: / Reply Quote