Introduction
The human genome has been significantly changing throughout the centuries. As the nucleotide sequence is a piece of information by which the cells are governed, understanding the genetic background has a high number of applications and possible benefits. Hence, bioinformatics is a crucial area of science, which helps in determining the biological sequences (DNA, RNA, as well as protein) and creating a uniform database to store this vast amount of information (Wani et al., 2018). First, the Human Genome Project and after its successor, Human Variome Project, both implemented the techniques from bioinformatics for achieving significant results. This report summarises, analyses the role of bioinformatics today and uses practical evidence for analyzing the given sequence of aspartoacylase.
Human Genome Sequencing
DNA sequencing is the action of identifying the nucleotide chain of a DNA molecule. The first method in the Human Genome Project which has made a significant contribution to modern science is the shotgun sequencing method (Wani et al., 2018). This approach includes the random splitting of the whole nucleotide sequence, identifying shorter chains and assembling altogether according to the overlapping regions of individual parts of the DNA molecule. The random splitting of the sequence is accomplished by the insertion of dideoxyribonucleoside triphosphates. Incorporation of them leads to termination of DNA replication as the subsequent nucleotides cannot bind due to the lack of OH group. Considering that the method is based on the randomness of the DNA molecule fragmentation, the whole process needs to have a moderate excess. The reason is to guarantee the matching of the parts to re-establish the initial sequence.
The DNA sequencing method used nowadays is next-generation sequencing (NGS). Unlike the shotgun sequencing method, the NGS implements multiple sequencing technologies (Kumar and Chordia, 2017). Moreover, this approach allows the sequencing of multiple DNA regions in the entire human genome. Besides, it generates accurate and high-quality sequencing results that are reliable for genome analysis
Role of Bioinformatics
As mentioned above, identifying the composition and the sequence of biological chain molecules is vital for understanding the nature of the organisms and determining the diseases. Bioinformatics helps to collect the necessary data, gather it into significant databases and make it accessible to researchers. Apart from that, bioinformatics assists scientists to track the changes in molecules and regulations in organisms in response to particular effects. As a result, contemporary research papers can include multiple genes into a single analysis and make more comprehensive conclusions.
In the modern world, bioinformatics has diverse roles in various areas of biological advancements. The increasing generation of biological data, such as DNA and mRNA, has led to the development of the field of genomics (Kumar and Chordia, 2017). Numerous bioinformatics tools have been created for genomic analysis of genes and their expression profiles to identify their molecular structures and cellular functions (Kumar and Chordia, 2017). Moreover, bioinformatics plays a significant role in proteomics because they aid in the analysis of cellular activities and functions. Since the central dogma states that genes undergo expression to produce proteins, bioinformatics offers critical tools for proteomics. Massive accumulation of protein data has boosted the growth of proteomics to determine protein-protein interactions, expression profiles, protein structures, and enzyme activities (Kumar and Chordia, 2017). Bioinformatics also applies in the field of transcriptomics, which is the analysis of the expression profiles of genes in various cells. The study of mRNA and total RNA provides important information about the activity and functions of cells, tissues, and organs.
Cheminformatics applies bioinformatics in the analysis of chemicals for drug discovery. In cheminformatics, bioinformatics allows indexing, storage, retrieval, and analysis of chemical compounds within a short time and with minimal costs. Since drug design and drug discovery is long and complex process, the use of bioinformatics has shortened and eased research activities (Kumar and Chordia, 2017). Computer-aided drug design applies high throughput bioinformatics tools that analyze a huge volume of data and come up with candidate drugs for clinical studies. For example, the most topical application of bioinformatics methods today is its use in creating the COVID-19 vaccine. Bioinformatics enables the determination of the evolutionary relationship of organisms based on their genomes. For example, phylogenetic trees provide a way of comparing evolutionary relationships between organisms and determining whether they have close or distant relationships.
In agriculture, bioinformatics tools are applied in the improvement of crops and animal breeding. Comparative genomics identifies genes of interest in plants and animals for the effective design of experiments aimed at creating hybrids. For instance, random-amplified polymorphic DNA (RAPD), simple sequence repeat (SSR), amplified fragment length polymorphism (AFLP), and restriction fragment length polymorphism (RFLP) are markers that play a critical role in the identification of genes and the breeding process (Wani et al., 2018). In forensic science, bioinformatics provides accurate analysis of fingerprinting data and aid in the resolution of criminal cases. Other applications of bioinformatics are analysis of biodiversity, development of biofuels, biodefence interventions, and creation of microbes that degrade waste materials.
Discoveries of the Human Genome Project
The Human Genome Project led to the discovery of the genetic variation between people and informed the need to customize medicine. To improve understanding of the Human Genome Project, Varicome Project, HapMap Project and the 1000 Genome Project focused on characterizing and identifying single nucleotide variations in humans. Further examination of genomic variations has resulted in the discoveries of cancer markers by the Cancer Genome Atlas and the International Cancer Genome Consortium (ICGC) (Campbell et al., 2020). Additionally, the Human Genome Project has eased the diagnosis of diseases using molecular markers. For example, genetic diseases such as cystic fibrosis (CF), hemophilia A, and polycythemia vera (PV) are possible to screen and diagnose in populations.
Aim
The aim of the report is to apply bioinformatics tools in the analysis of the cloned nucleotide sequence that codes for aspartoacylase.
Results of Sequence Analysis
Gene Structure
The analysis of gene structure indicates that the cloned gene comprises coding sequence (exon) and non-coding sequences (introns) on 5’ and 3’ ends. Sequence analysis commenced with the identification of the open reading frame of the provided nucleotide sequence obtained from cloning using ORF Finder. Results of the ORF Finder (Appendix 1) indicate that the cloned gene has 1434 nucleotides with the longest open reading frame (+2 strand) of 957 bases starting from the 143rd base to the 1099th base. Moreover, the identified reading frame codes for aspartoacylase(ASPA), which is a protein with 318 amino acids. The outcomes of open reading frames show that the cloned sequence has 142 bp 5’ and 335 bp 3’untranslated sequences (introns) that flank the ASPA gene.
Similarity Search
Basic Local Alignment Search Tool (BLASTn) was used to find similar nucleotide sequences in the database of the National Centre for Biotechnology Information (NCBI). BLAST results (Appendix 2) classify the cloned sequence because it has 100% identity, 100% coverage (1434 bases), a total score of 2649, a maximum score of 2649 and an expected value of 0 with human aspartoacylase (ASPA) (Accession S67156.1).
Phylogenetic Tree
The evolutionary relationship of the cloned sequence and known sequences with percent identity greater than 85% was determined using BLAST Tree View. The neighbor-joining method, maximum sequence difference of 0.75 and sequence title were used to generate the phylogenetic tree. Outcomes of the phylogenetic tree (Appendix 3) indicate that the cloned sequence has a distant relationship with ASPA of Macaca fascicularis (Crab-eating macaque), a close relationship with ASPA of Pongo abelii (orangutan) and a very close relationship with ASPA of Homo sapiens (humans) (Appendix 3). Therefore, the analysis of the evolutionary relationship reveals that the cloned sequence belongs to the human form of ASPA.
Genomic Location
Human Genome Assembly situated ASPA gene (NC_000017.11) to be located in position 13.2 of the short arm of chromosome 17 (17q13.2) in genomic coordinates of 3473646 to 3503405 (HUGO Gene Nomenclature Committee, 2020). The expression of ASPA occurs in the brain and aid in the metabolism of acetyl aspartic acid.
Associated Disease
The deficiency in ASPA associated with the neurological disorder of the brain called Canavan disease. In the mechanism of action, the ASPA gene codes for aspartoacylase, an enzyme that catabolizes the conversion of N-acetyl-L-aspartic acid (NAA) in the brain to acetate and aspartate (HUGO Gene Nomenclature Committee, 2020). Mutations that occur in ASPA make it lose activity, resulting in the accumulation of NAA in the brain and the development of Canavan disease (Von Jonquieres et al., 2018). Missense mutations prevent catabolism of NAA and cause its accumulation to toxic levels in the brain where neurological abnormalities ensue.
Protein Structure Prediction
Physicochemical properties of the open reading frame were determined using Protparam (Swiss Institute of Bioinformatics, 2020b). Results show that the ASPA gene codes for a protein with 318 amino acids with a predicted molecular weight of 36.279 kD, a theoretical isoelectric point of 5.96, an aliphatic index of 89.25 and a grand average of hydropathicity of -0.291 (Appendix 4). InterPro was used to scan for protein families associated with the identified exon from the cloned sequence (European Bioinformatics Institute, 2020). According to the outcomes of InterPro, the exon belongs to the family of succinylglutamate desuccinylase/aspartoacylase (IPR007036) and aspartoacylase (IPR016708) with the molecular function of hydrolase activity that acts on carbon-nitrogen and ester bonds (Appendix 5).
The structure of ASPA protein was predicted using the SWISS Model, which employs the homology approach in identifying templates with high levels of similarity (Swiss Institute of Bioinformatics, 2020b). The predicted protein structure shows the cloned gene has a similar structure as the template of aspartoacylase (4mxu.1A) with a leading the quaternary structure quality estimation (QSQE) of 1.00, percent identity of 98.68%, global model quality estimation (QMQE) of 0.98, coverage of 0.98 (6-318 amino acid residues) and x-ray resolution of 2.6 Å (Appendix 6). Figure 1 below shows that the predicted 3-dimensional model of ASPA protein.
Discussion of Results & Review
The analysis of the cloned sequence identifies the open reading frame of 957 nucleotides that starts from the 143rd base and ends at the 1099th base. This open reading frame forms an exon that codes for a protein with 318 amino acid residues. Moreover, the cloned sequence has introns that flank it as 142 bp 5’ and 335 bp 3’ introns. These findings of the gene structure are consistent with those in the literature because the gene size of ASPA is 318 amino acid residues (HUGO Gene Nomenclature Committee, 2020; Matalon, Delgado and Michals-Matalon, 2018). Thus, ORF Finder identified the open reading frame and the flanking introns accurately.
The similarity search using BLASTn identified the cloned gene of 1434 nucleotides having 100% sequence identity and coverage with ASPA of humans. These results reveal that the cloned gene originated from humans and covers the entire section of the gene with introns flanking it. According to Bannerman et al. (2018), ASPA is a gene found in mammals with the function of maintaining the white matter in the brain by regulating the amount of NAA in the brain. In humans, the gene is located in the short arm chromosome 17 at genomic position 13.2 (17q13.2). The expression of ASPA is high in the brain because it catabolizes NAA into aspartate and acetate where it determines the function of the central nervous system (HUGO Gene Nomenclature Committee, 2020; Von Jonquieres et al., 2018). The site of expression points out the role of ASPA in conserving the white matter of the brain and promoting neural functions.
Phylogenetic analysis indicated that the cloned sequence has a very close relationship with humans but a distant association with primates. This finding reveals that the cloned sequence was obtained from humans as the ASPA gene that regulates the catabolism of NAA. Von Jonquieres et al. (2018) explain that the ASPA gene codes for a critical enzyme in humans that optimizes neural functions by degrading NAA and preventing its accumulation in the brain. In primates and mice, knockdown of the ASPA gene resulted in the accumulation of NAA and poor fetal development (Bannerman et al., 2018). Hence, phylogenetic analysis depicted that humans and primates have closely related sequences of the ASPA gene.
The expression level of the ASPA gene is associated with the occurrence of Canavan disease. Poor fetal development, seizures, vacuolization of the central nervous system, macrocephaly and hypomyelination are some of the characteristics of Canavan disease (Von Jonquieres et al., 2018). The analysis of expression shows that the ASPA gene codes for aspartoacylase, which is a key enzyme that works by converting N-acetyl-aspartic acid (NAA) to acetate and aspartate in the brain to conserve the white matter (HUGO Gene Nomenclature Committee, 2020). A deficiency in aspartoacylase causes the accumulation of NAA in the brain, leading to the degradation of the white matter and the emergence of Canavan disease (Von Jonquieres et al., 2018). Mutations in the ASPA gene cause deficiency in aspartoacylase by diminishing its activity in the catabolism of NAA in the brain. Single nucleotide mutations associated with Canavan disease that causes a complete loss of activity are p.Y231X, p.E285Aand p.A305E (Matalon, Delgado and Michals-Matalon, 2018). The inheritance of Canavan disease by offspring follows an autosomal recessive way from parents with homozygous dominant.
The prediction of the protein structure of ASPA shows that it has a moderate size because it constitutes 318 amino acids with an approximate size of 36kD. The negative grand average of hydropathicity indicates that the protein is hydrophilic and soluble in water (Swiss Institute of Bioinformatics, 2020a). Moreover, the acidic isoelectric point implies that the protein can be isolated at a pH of 5.96 in gradient gels. The prediction of function confirmed that the cloned sequence hydrolase activity breaks carbon-nitrogen and ester bonds (HUGO Gene Nomenclature Committee, 2020). The homology modeling generated a valid 3-dimensional structure of ASPA from a template of aspartoacylase (4mxu.1A) with about 100% coverage and identity. Overall, both BLAST and the homology modeling identified the cloned sequence as aspartoacylase of humans.
References
Bannerman, P. et al. (2018) ‘Brain nat8l knockdown suppresses spongiform leukodystrophy in an aspartoacylase-deficient Canavan disease mouse model’, Molecular Therapy, 26(3), pp. 793-800.
Campbell, P. J. et al. (2020) ‘Pan-cancer analysis of whole genomes’, nature, 578, pp. 82-93.
European Bioinformatics Institute. (2020) InterPro: classification of protein families.
HUGO Gene Nomenclature Committee. (2020) Symbol report for ASPA.
Kumar, A. and Chordia, N. (2017) ‘Role of bioinformatics in biotechnology’, Research & Reviews in BioSciences, 12(1), pp. 1-6.
Matalon, R., Delgado, L. and Michals-Matalon, K. (2018) Canavan Disease: aspartoacylase deficiency.
Swiss Institute of Bioinformatics. (2020a) Expasy: Protparam. Web.
Swiss Institute of Bioinformatics. (2020b) Expasy: SWISS Model.
Von Jonquieres, G. et al. (2018), ‘Uncoupling N-acetyl aspartate from brain pathology: implications for Canavan disease gene therapy’, Acta Neuropathologica, 135, pp. 95-113.
Wani, M. Y. et al. (2018) ‘Advances and applications of bioinformatics in various fields of life’, International Journal of Fauna and Biological Studies, 5(2), pp. 3-10.