Bioinformatica t7-protein structure

  • Published on

  • View

  • Download

Embed Size (px)


Protein Structure


<ul><li> FBW 27-11-2012Wim Van Criekinge </li> <li> Inhoud Lessen: Bioinformatica GEEN LES </li> <li> Biobix: Applied Bioinformatics Research Thesisonderwerpen Lopend onderzoek Biomerker predictie / Methylatie Metabonomics Peptidomics Translational biotechnology (text mining) Structural Genomics miRNA prediction / Target Prediction Exploring genomic dark matter (junk mining) Samenwerking met diverse instituten Ambities om te peer-reviewed te publiceren </li> <li> The reason for bioinformatics to exist ? empirical finding: if two biological sequences are sufficiently similar, almost invariably they have similar biological functions and will be descended from a common ancestor. (i) function is encoded into sequence, this means: the sequence provides the syntax and (ii) there is a redundancy in the encoding, many positions in the sequence may be changed without perceptible changes in the function, thus the semantics of the encoding is robust. </li> <li> Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics &amp; Proteomics Weblems </li> <li> Why protein structure ? Proteins perform a variety of cellular tasks in the living cells Each protein adopts a particular folding that determines its function The 3D structure of a protein can bring into close proximity residues that are far apart in the amino acid sequence Catalytic site: Business End of the molecule </li> <li> Rationale for understanding protein structure and function structure determination Protein sequence structure prediction -large numbers of sequences, including Protein structure whole genomes - three dimensional - complicated - mediates function ?Protein function homology rational mutagenesis- rational drug design and treatment of disease biochemical analysis- protein and genetic engineering model studies- build networks to model cellular pathways- study organismal function and evolution </li> <li> About the use of protein models (Peitch) Structure is preserved under evolution when sequence is not Interpreting the impact of mutations/SNPs and conserved residues on protein function. Potential link to disease Function ? Biochemical: the chemical interactions occerring in a protein Biological: role within the cell Phenotypic: the role in the organism Gene Ontology functional classification ! Priorisation of residues to mutate to determine protein function Providing hints for protein function:Catalytic mechanisms of enzymes often require key residues to be close together in 3D space (protein-ligand complexes, rational drug design, putative interaction interfaces) </li> <li> MIS-SENSE MUTATIONe.g. Sickle Cell AnaemiaCause: defective haemoglobin due to mutation in -globin geneSymptoms: severe anaemia and death in homozygote </li> <li> Normal -globin - 146 amino acidsval - his - leu - thr - pro - glu - glu - --------- 1 2 3 4 5 6 7Normal gene (aa 6) Mutant geneDNA CTC CACmRNA GAG GUGProduct Glu ValineMutant -globinval - his - leu - thr - pro - val - glu - --------- </li> <li> Protein Conformation Christian Anfinsen Studies on reversible denaturation Sequence specifies conformation Chaperones and disulfide interchange enzymes: involved but not controlling final state, they provide environment to refold if misfolded Structure implies function: The amino acid sequence encodes the proteins structural information </li> <li> How does a protein fold ? by itself: Anfinsen had developed what he called his "thermodynamic hypothesis" of protein folding to explain the native conformation of amino acid structures. He theorized that the native or natural conformation occurs because this particular shape is thermodynamically the most stable in the intracellular environment. That is, it takes this shape as a result of the constraints of the peptide bonds as modified by the other chemical and physical properties of the amino acids. To test this hypothesis, Anfinsen unfolded the RNase enzyme under extreme chemical conditions and observed that the enzymes amino acid structure refolded spontaneously back into its original form when he returned the chemical environment to natural cellular conditions. "The native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence, in a given environment." </li> <li> Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics &amp; Proteomics Weblems </li> <li> The Basics Proteins are linear heteropolymers: one or more polypeptide chains Below about 40 residues the term peptide is frequently used. A certain number of residues is necessary to perform a particular biochemical function, and around 40-50 residues appears to be the lower limit for a functional domain size. Protein sizes range from this lower limit to several hundred residues in multi-functional proteins. Three-dimentional shapes (folds) adopted vary enormously Experimental methods: X-ray crystallography NMR (nuclear magnetic resonance) Electron microscopy Ab initio calculations </li> <li> Levels of protein structure Zeroth: amino acid composition (proteomics, %cysteine, %glycine) </li> <li> Amino Acid Residues The basic structure of an a-amino acid is quite simple. R denotes any one of the 20 possible side chains (see table below). We notice that the Ca-atom has 4 different ligands (the H is omitted in the drawing) and is thus chiral. An easy trick to remember the correct L-form is the CORN-rule: when the Ca-atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction. </li> <li> Amino Acid Residues </li> <li> Amino Acid Residues </li> <li> Amino Acid Residues </li> <li> Amino Acid Residues </li> <li> Levels of protein structure Primary: This is simply the order of covalent linkages along the polypeptide chain, I.e. the sequence itself </li> <li> Backbone Torsion Angles </li> <li> Backbone Torsion Angles </li> <li> Levels of protein structure Secondary Local organization of the protein backbone: alpha- helix, Beta-strand (which assemble into Beta- sheets) turn and interconnecting loop. </li> <li> Ramachandran / Phi-Psi Plot </li> <li> The alpha-helix </li> <li> A Practical Approach: Interpretation Residues with hydrophobic properties conserved at i, i+2, i+4 separated by unconserved or hydrophilic residues suggest surface beta- strands. A short run of hydrophobic amino acids (4 residues) suggests a buried beta- strand. Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an alfa-helix with one face packing in the protein core. Likewise, an i, i+3, i+4, i+7 pattern of conserved hydrophobic residues. </li> <li> Beta-sheets </li> <li> Topologies of Beta-sheets </li> <li> Secondary structure prediction ? </li> <li> Secondary structure prediction:CHOU-FASMAN Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical, - sheet, and random coil regions calculated from proteins. Biochemistry 13, 211-221. Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13, 222-245. </li> <li> Secondary structure prediction:CHOU-FASMAN Method Assigning a set of prediction values to a residue, based on statistic analysis of 15 proteins Applying a simple algorithm to those numbers </li> <li> Secondary structure prediction:CHOU-FASMAN Calculation of preference parameters For each of the 20 residues and each secondary structure ( - helix, -sheet and -turn): observed counts P = Log --------------------- + 1.0 expected counts Preference parameter &gt; 1.0 specific residue has a preference for the specific secondary structure. Preference parameter = 1.0 specific residue does not have a preference for, nor dislikes the specific secondary structure. Preference parameter &lt; 1.0 specific residue dislikes the specific secondary structure. </li> <li> Secondary structure prediction:CHOU-FASMAN Preference parameters Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.101 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 0.089 Gln 1.17 1.23 0.56 0.050 0.089 0.030 0.089 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 1.68 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.050 0.033 0.033 Ile 1.00 1.60 0.58 0.068 0.034 0.017 0.051 Leu 1.34 1.22 0.53 0.038 0.019 0.032 0.051 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.070 0.036 0.070 Phe 1.12 1.28 0.71 0.031 0.047 0.063 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.79 0.72 1.56 0.100 0.095 0.095 0.104 Thr 0.82 1.20 1.00 0.062 0.093 0.056 0.068 Trp 1.14 1.19 1.11 0.045 0.000 0.045 0.205 Tyr 0.61 1.29 1.25 0.136 0.025 0.110 0.102 Val 1.14 1.65 0.30 0.023 0.029 0.011 0.029 </li> <li> Secondary structure prediction:CHOU-FASMAN Applying algorithm1. Assign parameters to residue.2. Identify regions where 4 out of 6 residues have P(a)&gt;100: -helix. Extend helix in both directions until four contiguous residues have an average P(a)P(b): -helix.3. Repeat this procedure to locate all of the helical regions.4. Identify regions where 3 out of 5 residues have P(b)&gt;100: -sheet. Extend sheet in both directions until four contiguous residues have an average P(b)105 and P(b)&gt;P(a): -helix.5. Rest: P(a)&gt;P(b) -helix. P(b)&gt;P(a) -sheet.6. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) &gt; 0.000075; (2) average P(t)&gt;1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<p>P(b): -turn. </p></li> <li> Secondary structure prediction:CHOU-FASMAN Successful method? 19 proteins evaluated: Successful in locating 88% of helical and 95% of regions Correctly predicting 80% of helical and 86% of - sheet residues Accuracy of predicting the three conformational states for all residues, helix, b, and coil, is 77% Chou &amp; Fasman:successful method After 1974:improvement of preference parameters </li> <li> Sander-Schneider: Evolution of overall structure Naturally...</li></ul>