Protein Folding Algorithms
Research Summary
1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap
Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold.The Sequence of the Human Genome
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort.Visualization Challenges for a New Cyberpharmaceutical Computing Paradigm
In recent years, an explosion in data has been profoundly changing the field of biology and creating the need for new areas of expertise, particularly in the handling of data. One vital area that has so far received insufficient attention is how to communicate the large quantities of diverse and complex information that is being generated.Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues.
Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20–22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content....a very nice step forward in the computerology of proteins.
Ken Dill
Folding proteins fast (Science 1995)
The subject of chaos is characterized by an abundance of quantitative data, an unending supply of beautiful pictures, and a shortage of rigorous theorems. Rigorous theorems are the best way to give a subject intellectual depth and precision. Until you can prove rigorous theorems, you do not fully understand the meaning of your concepts.
Freeman Dyson
Birds and frogs (Notics of the American Mathematical Society 2009)
The most vitally characteristic fact about mathematics is, in my opinion, its quite peculiar relationship to the natural sciences ... In modern empirical sciences it has become more and more a major criterion of success whether they have become accessible to the mathematical method or to the near-mathematical methods of physics. Indeed, throughout the natural sciences an unbroken chain of successive pseudomorphoses, all of them pressing toward mathematics, and almost identified with the idea of scientific progress, has become more and more evident. Biology becomes increasingly pervaded by chemistry and physics, chemistry by experimental and theoretical physics, and physics by very mathematical forms of theoretical physics.
There is a quite peculiar duplicity in the nature of mathematics. One has to realize this duplicity, to accept it, and to assimilate it into one's thinking on the subject. This double face is the face of mathematics, and I do not believe that any simplified, unitarian view of the thing is possible without sacrificing the essence.John von Neumann
The mathematician (The Works of the Mind 1947)
Protein folding is a fascinating cross-disciplinary field that attracts scientists with different backgrounds and scientific cultures. They bring to the protein folding field the models and the way of thinking that are accepted of their respective background fields. Such diversity of scientific cultures is a great virtue of the protein folding field, in which physics, chemistry, biology, and mathematics meet. It is important for our cross-disciplinary field to discuss with balance both strong points and limitations of different approaches.
E. Shakhnovich
Modeling protein folding: the beauty and power of simplicity (Folding and Design 1996)
Protein folding is a fascinating cross-disciplinary field that attracts scientists with different backgrounds and scientific cultures. They bring to the protein folding field the models and the way of thinking that are accepted of their respective background fields. Such diversity of scientific cultures is a great virtue of the protein folding field, in which physics, chemistry, biology, and mathematics meet. It is important for our cross-disciplinary field to discuss with balance both strong points and limitations of different approaches.
Eugene Shakhnovich
Modeling protein folding: the beauty and power of simplicity (Folding and Design 1996)
The protein folding problem is three different problems: the folding code – the thermodynamic question of how a native structure results from the interatomic forces acting on an amino acid sequence; protein structure prediction – the computational problem of how to predict the native structure of a protein from its amino acid sequence; and the folding speed (Levinthal’s paradox) – the kinetic question of how a protein can fold so fast... Current knowledge of the folding codes is sufficient to guide the successful design of new proteins and new materials. Current computer algorithms are now predicting the native structures of small simple proteins remarkably accurately, contributing to drug discovery and proteomics. Even once intractable Levinthal puzzle now seems to have a very simple answer...
Ken Dill
The protein folding problem: when will it be solved?(Current Opinion in Structural Biology 2007)
The failure of protein-folding processes, both within cells (in vivo) and within test tubes or industrial vats (in vitro), causes serious difficulties both for biomedical research and for the biotechnology industry. Protein chains that fail to fold properly aggregate into an insoluble and inactive state... There is increased recognition that some human diseases are associated with aberrations or defects in protein chain folding. These include Alzheimer’s and Huntington’s and cystic fibrosis.
Jonathan King
Protein folding and misfolding (American Scientist 2002)
Understanding the mechanism of protein folding is often called the “second half” of genetics. Computational approaches have been instrumental in the efforts. Simplified models have been applied to understand the physical principles governing the folding processes and will continue to play important roles in the endeavor.
Peter Kollman
Protein folding: from lattice to all-atom (IBM Systems Journal 2001)
We must emphasize a statement which I am sure you have heard before, but which must be repeated again and again. It is that the sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model, I mean a mathematical construct which, with the addition of certain verbal interpretations, describes the observed phenomena. Furthermore, it must satisfy certain aesthetic criteria, that is, in relation to how much it describes, it must be rather simple. Since one cannot tell exactly how 'simple' simple is... Simplicity is largely a matter of historical background, of previous conditioning, of antecedents, of customary procedures, and it is very much a function of what is explained by it.
John von Neumann
Method in physical sciences (The Unity of Knowledge 1995)
It seems remarkable that so simple a model based on time-averaged forces can account for the stability and folding of a molecule as complicated as a protein. Looking at known protein conformations closely, one is struck by the precise geometry of the interatomic contacts that stabilize the molecule: all possible interior hydrogen bonds are well formed, and many of the nonpolar side chains interlock to form a close-packed interior. ... [T]he forces responsible for this precise geometry... cause the chain to fold into the approximate shape rapidly and without having to pass through many local minima... Although calculating the energy of the all-atom molecule would be time-consuming, one would have the great advantage of starting close to the right conformation... The general concept of using a simple model... when the detailed forces are too complicated has many potential applications... Such a hierarchical approach might eventually lead to an understanding and simulation of very complicated biological assembly processes.
Michael Levitt and Arieh Warshel
Computer simulation of protein folding (Nature 1975)
Folding is an intrinsically statistical phenomenon and no conclusion can be derived from a single folding or unfolding trajectory. ... Lattice and other simplified analytical models are the statistical mechanician’s contribution to the protein folding... their intimate connection with statistical mechanics... is very important as it often allows us to compare simulation with statistical-mechanical analytical theories.
Eugene Shakhnovich
Modeling pretin folding: the beauty and power of simplicity (Folding and Design 1996)
The central question addressed in this review is this: Is there some clever algorithm, yet to be invented, that can find the global minimum of a protein’s potential-energy function reliably and reasonably quickly? Or is there something intrinsic to the problem that prevents such a solution from existing?... Is there an approximation algorithm for global potential-energy minimization?... To our knowledge, the possible existence of an approximation algorithm for protein structure prediction has not been addressed... Such an approximation algorithm might be of significant practical use in protein structure prediction, because exactness is not a central issue.
Martin Karplus
Computational complexity, protein structure prediction, and the Levinthal paradox (The Protein Folding Problem and Tertiary Structure Prediction 1994)
It is the mark of an instructed mind to rest satisfied with the degree of precision which the nature of the subjects permits and not seek an exactness where only an approximation of the truth is possible.
Aristotle
Nichomachean ethics (319 BC)
The exactness of mathematics is well illustrated by proofs of impossibility. When asserting that doubling the cube... is impossible, the statement does not merely refer to a temporary limitation of human ability to perform this feat. It goes far beyond this, for it proclaims that never, no matter what, will anybody ever be able to [double the cube]. No other science, or for that matter no other discipline of human endeavor, can even contemplate anything of such finality.
Mark Kac and Stan Ulam
(1968)
For a quarter of a century now, NP-completeness has been computer science’s favorite paradigm, fad, punching bag, buzzword, alibi, and intellectual export... pervasive and contagious.
Christos Papadimitriou
(1995)
The true elegance of this consequence of natural selection was dramatized by the ribonucleases work since the refolding of this molecule after full denaturation by reductive cleavage of its four disulfide bonds... required that only 1 of 10^5 possible pairings of eight sulfhyryl groups to form four disulfide linkages take place...to establish... the “thermodynamic hypothesis.” This hypothesis states that the three-dimensional structure of a native protein in its normal physiological milieu (solvent, pH, ionic strength, presence of other components such as metal ions or prosthetic groups, temperature, and other) is the one in which the Gibbs free energy of the whole system is lowest; that is, that the native conformation is determined by its totality of interatomic interactions and hence by the amino acid sequence, in a given environment.
Christian Anfinsen
Principles that govern the folding of protein chains (Science 1973)
Relevant Papers
-
The Sequence of the Human Genome
2001J. Craig Venter, Mark D. Adams, Eugene W. Myers, Peter W. Li, Richard J. Mural, Granger G. Sutton, Hamilton O. Smith, Mark Yandell, Cheryl A. Evans, Robert A. Holt, Jeannine D. Gocayne, Peter Amanatides, Richard M. Ballew, Daniel H. Huson, Jennifer Russo Wortman, Qing Zhang, Chinnappa D. Kodira, Xiangqun H. Zheng, Lin Chen, Marian Skupski, Gangadharan Subramanian, Paul D. Thomas, Jinghui Zhang, George L. Gabor Miklos, Catherine Nelson, Samuel Broder, Andrew G. Clark, Joe Nadeau, Victor A. McKusick, Norton Zinder, Arnold J. Levine, Richard J. Roberts, Mel Simon, Carolyn Slayman, Michael Hunkapiller, Randall Bolanos, Arthur Delcher, Ian Dew, Daniel Fasulo, Michael Flanigan, Liliana Florea, Aaron Halpern, Sridhar Hannenhalli, Saul Kravitz, Samuel Levy, Clark Mobarry, Knut Reinert, Karin Remington, Jane Abu-Threideh, Ellen Beasley, Kendra Biddick, Vivien Bonazzi, Rhonda Brandon, Michele Cargill, Ishwar Chandramouliswaran, Rosane Charlab, Kabir Chaturvedi, Zuoming Deng, Valentina Di Francesco, Patrick Dunn, Karen Eilbeck, Carlos Evangelista, Andrei E. Gabrielian, Weiniu Gan, Wangmao Ge, Fangcheng Gong, Zhiping Gu, Ping Guan, Thomas J. Heiman, Maureen E. Higgins, Rui-Ru Ji, Zhaoxi Ke, Karen A. Ketchum, Zhongwu Lai, Yiding Lei, Zhenya Li, Jiayin Li, Yong Liang, Xiaoying Lin, Fu Lu, Gennady V. Merkulov, Natalia Milshina, Helen M. Moore, Ashwinikumar K Naik, Vaibhav A. Narayan, Beena Neelam, Deborah Nusskern, Douglas B. Rusch, Steven Salzberg, Wei Shao, Bixiong Shue, Jingtao Sun, Zhen Yuan Wang, Aihui Wang, Xin Wang, Jian Wang, Ming-Hui Wei, Ron Wides, Chunlin Xiao, Chunhua et al. Yan
-
Visualization challenges for a new cyberpharmaceutical computing paradigm
2001Russell J. Turner, Kabir Chaturvedi, Nathan J. Edwards, Daniel Fasulo, Aaron L. Halpern, Daniel H. Huson, Oliver Kohlbacher, Jason R. Miller, Knut Reinert, Karin A. Remington, Russell Schwartz, Brian Walenz, Shibu Yooseph, Sorin Istrail
-
Insights Into the Association of Partially Folded Chains Derived From Lattice Simulation Models
1999Russell Schwartz, Sorin Istrail, Jonathan King
Sandia Labs
-
Prediction of Energetic Tiles Self-Assembly
1999Ross A. Lippert, Sorin Istrail, Alan Hurd
Sandia Labs
-
Mathematics of Self-Assembly
1999John H. Conway, Sorin Istrail
Sandia Labs
-
Branch-and-Bound LP-algorithms for Protein Structure Alignment Based on Contact Map Overlap
1999Robert Carr, Giuseppe Lancia, Sorin Istrail
Sandia Labs
-
Crystallographical universal approximation: A complexity theory of protein folding algorithms for crystal lattices
1995William E. Hart, Sorin Istrail
In Sandia Labs