Protein Sequence Determination Report (Assessment)

Exclusively available on Available only on IvyPanda® • No AI

Analyzing proteins and especially determining the correct sequence of amino acids is a complex but not impossible task due to the existence of various methodologies available today. For example, 2D gel electrophoresis helps identify the protein mass and size, while tandem mass spectrometry is a tool for finding the exact peptide sequence using enumerative scoring of observed peaks (Yilmaz et al., 2022). If performed manually, the determination of the sequence is a relatively challenging process; thus, many researchers nowadays prefer to use software and artificial intelligence (Yilmaz et al., 2022). Still, the old manual method allows for a relatively accurate sequence prediction.

Based on the provided information in this question, the peptide sequence is likely PVYNCSAHDNCSD from N-terminus to C-terminus. Since the m/z ratio of the doubly charged parent ion was determined to be 708.4. It means that the mass of the peptide should be about 1416.8, but if the size of the proton and deuterium ion are subtracted, then the mass is 1410.8. The first amino acid can be determined by subtracting the peptide mass from the molecular mass: 1410.8 – 1314.64 = 96.16, which may correspond to Proline. The peptide sequence was determined by subtracting the mass of two peaks that are not adjacent to each other starting from the right side of the provided mass spectrum graph, adjusting for C and N termini, respectively. After that, the obtained numbers were used to determine the corresponding amino acid in the table. The calculations and the determined amino acids are shown below from the right to the left side of the chart:

C: 1314.64 – 1267.65 = 115.03 (Aspartic acid – D)
N: 1267.65 – 1168.53 = 99.12 (Valine – V)
C: 1128.57 – 1039.49 = 89.08 (Serine – S)
N: 1108.52 – 942.49 = 166.03 (Tyrosine – Y)
C: 892.42 – 786.40 = 106.02 (Cysteine – C)
N: 843.42 – 729.36 = 114.06 (Asparagine – N)
C: 685.36 – 571.29 = 114.07 (Asparagine – N)
N: 625.31 – 522.29 = 103.02 (Cysteine – C)
C: 472.22 – 356.83 = 115.39 (Aspartic acid – D)
N: 375.22 – 286.14 = 89.08 (Serine – S)
C: 356.83 – 246.18 = 110.65 (Histidine – H)
N: 286.14 – 215.10 = 71.04 (Alanine – A)

Overall, based on the calculations above, the final sequence of the peptide fragment analyzed in the mass spectrometer is PVYNCSAHDNCSD. However, it is only a fragment of the entire protein, which was digested with trypsin to be run in the mass spectrum machine in several steps involving fragmentation and degradation. Indeed, it is a normal part of proteomic research since sequencing the entire protein is still problematic; hence, tandem mass spectrometry is utilized for that purpose (Chong & Leong, 2012). When a peptide enters the mass spectrum machine, it is ionized and fragmented along the backbone, forming multiple pieces with various m/z ratios that allow for determining the sequence of a peptide (Chong & Leong, 2012). Since this peptide sequence was determined without the use of software, some point errors were likely present. Still, the calculated mass of the obtained sequence is 1424, which is close to one of the doubly charged parent ion.

The original protein sequence can be determined either by the top-down, middle-down, or bottom-up approach. The top-down method is when the protein sequence is identified using the intact protein. The latter two are when the original protein is extracted from the gel, exposed to a digestive enzyme like trypsin, and then its peptide sequences are determined using mass spectrometry (Fisher Analytics, 2022). The difference between middle and bottom-up techniques is in the size of the resulting peptide fragments, which are larger in the former (Fisher Analytics, 2022). After the identification of all peptide fragments, they are compared to the known sequences in the available proteome databases (Fisher Analytics, 2022). Therefore, to determine the entire sequence of the protein in this question, it is necessary to have the sequence of all peptides, into which its polypeptide sequence was degraded using trypsinization. The advantage of this method is its relative accuracy for the known proteins. The apparent drawback is that it cannot be applied in cases when the protein is novel, and its sequence is not present in the database. Thus, the de novo sequencing method should be used in such situations.

Protein sequence determination is an essential component of any fundamental research. Studying proteins is as critical as understanding DNA mutations because protein misfolding, which may result in various diseases, is not always a straightforward outcome of a genetic alteration. Therefore, studying normal and abnormal protein structures and sequences is necessary to understand phenotypic abnormalities found in different disorders. Therefore, it is crucial to master various methodologies used for the characterization of short peptides, oligopeptides, polypeptides, and proteins. The molecular size of the protein can be identified using simple gel electrophoresis, but in order to determine its amino acid sequence of it, tandem mass spectrometry is required. Thanks to the advancements in biotechnology and bioinformatics, this process became faster and more efficient compared to the early days of sequencing when every individual amino acid had to be cleaved with chemical degradation.

References

Chong, K. F., & Leong, H. W. (2012). Tutorial on de novo peptide sequencing using MS/MS mass spectrometry. Journal of Bioinformatics and Computational Biology, 10(06), 1-39.

Fisher Analytics. (2022). Protein and peptide sequencing. Web.

Yilmaz, M., Fondrie, W., Bittremieux, W., Oh, S., & Noble, W. S. (2022). De novo mass spectrometry peptide sequencing with a transformer model. BioRxiv, 1-12. Web.