In-Depth Biological Research Essay

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Phylogenetic Tree Analysis
Genome Size Analysis
CG Content Analysis
Contig Number Analysis
Coding Sequences Analysis
Species Analysis
Metabolism Analysis
Plasmid Genes
Reference List

Phylogenetic Tree Analysis

In this study, it was checked whether there is an evolutionary relationship between the available RNA nucleotide sequences Gr1 and Gr2, issued by the supervisor, and the prepared beforehand 16S rRNA gene sequence. It is essential to note in advance that the 16s rRNA nucleotide sequence of the gene was obtained using the open Nucleotide database owned by the National Centre for Biotechnology Information (NCBI), which allows finding the necessary sequence of the complete genome of the desired genus. The phylogenetic tree shown in Figure 1 was constructed using MEGA X software and based on the comparison of nucleotide sequences coding 16S rRNA.

The analysis of this tree demonstrates that there is a direct genetic relationship between the analyzed Gr1 sample and the Rhodococcus genus. In particular, the sequence of Gr1 is as close as possible to the sequence of Rhodococcus degradans strain CCM 4446, which is in the same clade with it. Moreover, Gr1 is still similar to all other strains in branches, but smaller with each new node and root.

Given that, Gr1 was most likely isolated from an unknown organism at Newcastle, it can be argued that this organism was very close in evolution to Rhodococcus degradans strain CCM 4446. Furthermore, given the research carried out by Švec et al. (2015), it can be assumed that this specimen is a previously undescribed species. At the same time, the Gr2 nucleotide sequence did not show any similarities and possibilities of gene alignment with the Rhodococcus genus.

*Fig. 1. The phylogenetic tree for Gr1 and Rhodococcus genus based on comparison of 16S rRNA gene (created by the author).*

Genome Size Analysis

In-depth biological research of nucleotide sequences is not limited to the construction of the phylogenetic tree and requires analysis of the size of the genome. First of all, it is essential to clarify that the genome in this course work is understood as the number of base pairs that make up DNA (Paten et al., 2017). At the same time, it is crucial to take into account that the full size of the prokaryotic genome includes not only the main DNA chain called the nucleoid, but also nucleotide sequences enclosed in plasmids and embedded bacteriophages (Raes et al., 2007). No direct correlation between the size of the genome and the degree of development of the organism was found (Sela, Wolf and Koonin, 2016).

Nevertheless, by comparing the genome size or weight of two potentially close strains of bacteria, geneticists may conclude that the species is relatively related or that the processes occurring within them are similar. For ease of analysis of Gr1 genome size and some species found using the phylogenetic tree, the genome size was measured in megabase. Fig. 2 shows quantitative similarities found between Gr1 and Rhodococcus qingshengii JCM 15477. However, Fig. 1 confirms the phylogenetic similarity between the two organisms under study.

*Fig. 2. The comparison of genome sizes in megabase (created by the author).*

CG Content Analysis

The GC-content analysis allows answering several critical questions concerning prokaryotes’ stability and their evolutionary proximity. It is known that DNA consists of four nucleotides that differ from each other in the type of nitrogen base (Ouellette et al., 2018). Three hydrogen bonds are formed between the purine base, called guanine, and the pyrimidine base called cytosine, which characterizes the more excellent stability of the CG pair in comparison with AT.

Thus, the more CG composition in DNA, the higher its thermal and denaturation resistance (Chowdhury et al., 2018). At the same time, the relative content of CG pairs in DNA is a stable sign of prokaryotes, which does not depend on either age or culturing conditions, or individual gene rearrangements in the chromosome. If one takes into account the fact that the average, natural content of guanine and cytosine is in the range from 25 to 75 percent, according to Figs. 3, sample Gr1 is natural (Yamada and Komagata, 1972). In addition, a similar pattern is the characteristic of all the organisms listed, except for Rhodococcus globerulus WS3306 and Rhodococcus globerulus NBRC 14531, which additionally emphasizes the evolutionary similarity of the following organisms.

*Fig. 3. Comparison between species of the GC-content (created by the author).*

Contig Number Analysis

The RAST research database provides annotated data on the genetic properties of some prokaryotes and eukaryotes. With this database, information has been found on the number of contigs obtained for the organisms under study. It is known that contigs can be called a consensus sequence of nucleic acids, which determines the actual DNA or RNA sequence with high accuracy (Jean, Radulescu and Rusu, 2017).

In addition, when assembling a genome (or transcriptome), it is essential to consider the length of the nucleotide sequence as well. For the variant Gr1 under study, it is most appropriate to use the OLC algorithm, which includes three stages of assembly: search for intersecting reeds, search for contigs and multiple alignments. The number of contigs allows making assumptions about the completeness of the genome of this or that species. Thus, following this assertion and analyzing Fig. 4, it can be concluded that the complete scaffold is assembled for Rhodococcus baikonurensis JCM 18801, Rhodococcus qingshengii JCM 15477 and Gr1.

*Fig. 4. Number of contigs in different species/ strain (created by the author).*

Coding Sequences Analysis

As it is known, the coding sequence is the central structural-functional unit of the gene, in which triplets of nucleotides coding the amino acid sequence are located (Walport and Schofield, 2018). It begins with a start codon and ends with a stop codon. The research aimed at analyzing nucleotide sequences is mainly focused on finding the number of coding sequences. This number can be interpreted both in terms of genetics and epigenetics. Genetically, the number of coding sequences of DNA shows how many fragments of nucleic acid are capable of indirectly synthesizing a protein (Walport and Schofield, 2018).

It can be logically assumed that the higher is the number of coding sites, the more complex is the biochemistry of the organism since such an individual has more proteins to synthesize. However, this statement must correlate with the principles of epigenetics, which states that the coding sites of DNA can work together to synthesize a single polypeptide (Ouellette et al., 2018). Thus, generally, the number of genes affects the degree of complexity of the body (except for the plant), but when comparing multiple species, the number of coding sequences has little or no effect on the degree of organization.

Fig. 5 shows that the most significant number of sequences synthesizing protein is typical for Rhodococcus qingshengii JCM 15477. For Gr1, this number does not differ much from the theoretical average, which may probably indicate the absence of any highly specialized features of metabolism.

*Fig. 5. Number of coding sequences in different species/ strain (created by the author).*

Species Analysis

With the development and cheapening of sequencing technologies, attempts are made to replace traditional labor-intensive methods, such as DNA-DNA hybridization, with other methods and approaches based on the analysis of complete genome sequences. One such approach is the calculation of the average nucleotide identity (ANI) for a pair of compared genomes (Figueras et al., 2014). The hypothesis that Gr1 belongs to the new species was tested by comparing the nucleotide sequence of Gr1 with other known species genomes. Thus, if the nucleotide coincidence is less than 95 percent, it means that there is the presence of a new species (Figueras et al., 2014). The analysis of Figure 7, where the nucleotide sequences of the twelve closest species or strains of Rhodococcus taken from GenBank by NCBI were compared in pairs, shows that Gr1 most likely belongs to a new species. Furthermore, RAST, modified with SEED, allows to evaluation of the novelty of the species. Thus, using this tool, it can be revealed that only 20 percent of the Gr1 genome is described in the database, which leads to the assumption of the novelty of this species.

*Fig. 6. Part of the nucleotide sequence Gr1 detected in the RAST/SEED (created by the author).*

*Fig. 7. ANI-analysis data (created by the author).*

It is essential to clarify that there are other methods of species belonging to organisms in nucleotide sequences. In particular, the method for calculating the average identity of all homologous protein sequences in a pair of compared genomes (AAI) was proposed. It is believed that this method can be successfully used to compare genomes of less related organisms (Haley et al., 2010). In order to be assigned “new” status, the AAI test must show a match of less than 96 percent (Haley et al., 2010). Figure 8 only confirms the statement of the novelty of the species with the nucleotide sequence Gr1.

AAI-analysis data. — *Fig. 8. AAI-analysis data (created by the author).*

Metabolism Analysis

As it is known, metabolism is a set of biochemical reactions that occur in a living organism to support life. These processes allow organisms to grow and reproduce, preserve their structures and respond to environmental influences (Ceniceros., 2017). Biologically active substances such as proteins, enzymes, carbohydrates, fats and vitamins play a vital role in the metabolism process. The central dogma of molecular biology postulates that DNA encoding sequences mediate the synthesis of proteins and enzymes that have a protein structure. This primarily determines the importance of studying the genome of organisms for the fact of encoding specific proteins.

Figure 9 shows the quantitative relationships between genes responsible for metabolism at different levels. For example, according to Kumar et al. (2020), the largest light green category includes 504 genes and is responsible for metabolic processes with amino acids. There are also categories responsible for carbohydrates (395 genes) – pink region and lipids and terpenoids (294 genes) – purple region.

*Fig. 9. Diagram of the presence of responsible genes in Gr1 (created by the author).*

*Table 1. Data on the presence of genes responsible for the synthesis of the substances (created by the author).*

Bacteriocin

Most of the Gram-positive and Gram-negative prokaryotes produce molecules of protein nature called bacteriocins as a result of their vital activity (Zacharof and Lovitt, 2012). Bacteriocins are the large family of peptides secreted by bacteria that have antimicrobial activity and act against other strains of the same species or closely related species (Dobson et al., 2012). In prokaryotes, bacteriocin is the source of branched-chain fatty acid (BCFA) (Dobson et al., 2012). The production of fatty acids regulates the pH of the internal content and ensures colonization resistance. In addition, among the functions of BCFA is the formation of a lipid layer of membranes and compensation for the lack of unsaturated fatty acids. The genetic mechanisms responsible for the synthesis of bacteriocins are generally similar in species with a percentage similarity above 70 percent, so it can be assumed that a species with Gr1 can synthesize this protein.

NPRS

Non-ribosomal peptides (NRPS) are synthesized by non-ribosomal peptide synthetases, which, unlike ribosomes, do not require mRNA. It is essential to clarify that NRPS are often antibiotics, cytostatics and immunosuppressors (Carrano et al., 2001). Thus, they promote the formation of such biomolecules in the prokaryotic cell as Hecterobactin A, S2, Erythrochelin, Coelichelin, Rifamorpholin A, B, C, D and E, SF2575, Polyketide, Plipastatins and Lipstatin (Stachelhaus and Marahiel, 1995). Genes for the synthesis of non-ribosomal peptides are usually organized into a single opera in bacteria.

Heterobactins

The Rhodococcus gene detects two NRPS clusters that have high homology with the genes responsible for the synthesis of heterobactin siderophores. As a tripeptide derivative, heterobactin A is responsible for controlling the concentration of iron ions (Carrano et al., 2001). Moreover, modification of the aromatic rings of the sulfonyl groups results in the formation of heterobactin S2 (Bosello et al., 2013). Both heterobactins are very likely to be found in the genomes of the species, so their presence in Gr1 is possible.

Ectoine

Being acyclic amino acid, ectoine (C6H10N2O2) can be detected in several bacteria (Jebbar et al., 1992). Thanks to ectoine, organisms can survive under stressful conditions: ectoine functions include regulation of temperature and osmotic pressure (Bergmann et al., 2013). In addition, this amino acid is capable of carrying some elements such as glycine betaine (Cánovas et al., 1998). Ectoine is present in all twelve species with a percentage of more than 70% in each species, so it can be argued that it is also present in Gr1.

Erythrochelin

Siderophores are low-molecular substances of different chemical structures, synthesized by many microorganisms that effectively bind iron. One such Siderophore, erythrochelin, comes out of the bacterial cell and captures the iron molecules and is then absorbed again by the microorganisms (Lazos et al., 2010). Erythrochelin is present in only nine species out of 12, which allows us to conclude about the potential finding of genes responsible for this metabolic pathway among Gr1.

Isorenieratene

Rhodococcus species detect the possibility of biosynthesis of isorenieratene, the metabolic product. It has been noted that isorenieratene exhibits increased activity in the presence of iron ions (Chen, 2018). Moreover, according to Tholl (2006), isorenieratene can provide resistance to oxidation, as well as participate in the formation of oxygen and hydroxyl radicals. Isorenieratene is found in almost all the studied species, so finding the responsible genes on Gr1 is likely.

Coelichelin

Coelichelin enzyme is a new peptide product produced by Streptomyces (Lautru et al., 2005). The enzyme can be activated to control iron concentration (Challis, 2008). Coelichelin is present in only seven species, as shown in the first table, with percentages below 30% in all seven species, so it is unlikely that it will be present in Gr1.

Rifamorpholins

Amycolatopsis sp. HCa4 produces five types of antibiotics as described in Xiao et al. (2017) as rifamorpholin A, B, C, D and E. The rifamorpholin heteropolycycles are distinguished not only by their chemical arrangement of radicals and functional groups but also by their physical and chemical properties (Xiao et al., 2017). AntiSMASH demonstrates the presence of five rifamorpholins in nine of the twelve groups, but with low probability. Thus, the presence of rifamorpholin A, B, C, D and E in Gr is unlikely.

SF2575 and SF2768

SF2575 is the powerful tetracycline antibiotic produced by the bacteria of the genus Streptomyces (Pickens et al., 2009). This type of antibiotic was found in ten of the twelve studied species, so with a certain degree of accuracy, it can be said that SF2575 is characteristic of the genus Rhodococcus and for Gr1, in particular.

Diazonitrile antibiotic SF2768 has been described in the paper Wang et al. (2017) as a cluster of genes NPRS, produced by Streptomyces thioluteus. SF2768 is both an antibiotic and an antifungal agent. It is known that SF2768 has been detected in nine species of organisms, so there is a possibility of finding it in Gr1 as well.

Polyketide

Polyketide is the polycarbonyl secondary metabolite formed in the cells of bacteria, fungi, animals and plants. Biosynthesis of polyketides is carried out by polymerization of simple blocks, acetyl and propylene groups, and reminds synthesis of fatty acids. The most important groups of polyketides are antibiotics and toxins (Jenke-Kodama et al., 2005). Thus, the probability of finding polyketides in the genus Rhodococcus is less than 6 percent, so the presence of such a gene in Gr1 is unlikely.

Plipastatin and lomofungin

The lipoprotein antibiotic Plipastatin, which is the product of Bacillus subtilis, according to antiSMASH data, is detected only in Gr1 (Gao et al., 2017). This fact calls into question the production of this biomolecule for the whole Rhodococcus genus. Lomofungin is another antibiotic synthesized by prokaryotes: this biomolecule is produced by the family Streptomycetes. Its mechanism of action is probably related to the inhibition of the synthesis of nucleic acids and proteins (Fraser and Creanor, 1975). That is why lomofungin is effective against bacteria and fungus. It was not found in other species, so even 26% in Gr1 is not the guarantee.

Plasmid Genes

A study of the Rhodococcus erythropolis BG43 genome reveals two clusters responsible for the synthesis of hydrolase and oxygenase enzymes (Müller et al., 2015). In addition, there is good evidence that the presence of these genes in Rhodococcus plasmids has adverse effects on human health (McLeod et al., 2005). These plasmids do not have long nucleotide chains (Müller et al., 2015). Comparative analysis using antiSMASH and RAST shows that it is impossible to find these genes in the Gr1 genome.

Plasmid gene	Length	Identities
AqdA1	430	( 42% )
AqdA2	309	( 37% )

Table 2. Information on plasmid genes and identity determination (created by the author).

Reference List

Bergmann, S. et al. (2013) ‘Membrane fluidity of halophilic ectoine-secreting bacteria related to osmotic and thermal treatment’, Bioprocess and Biosystems Engineering, 36(12), pp.1829-1841.

Bosello, M. et al. (2013) ‘Structural characterization of the heterobactin siderophores from Rhodococcus erythropolis PR4 and elucidation of their biosynthetic machinery’, Journal of Natural Products, 76(12), pp. 2282-2290.

Cánovas, D. et al. (1998) ‘Characterization of the genes for the biosynthesis of the compatible solute ectoine in the moderately halophilic bacterium Halomonas elongata DSM 3043’, Systematic and Applied Microbiology, 21(4), pp. 487-497.

Carrano, C. et al. (2001) ‘Heterobactins: a new class of siderophores from Rhodococcus erythropolis IGTS8 containing both hydroxamate and catecholate donor groups’, Biometals, 14(1), pp. 119-125.

Ceniceros, A. et al. (2017) ‘Genome-based exploration of the specialized metabolic capacities of the genus Rhodococcus’, BMC Genomics, 18(1), pp. 593-609.

Chen, Y. et al. (2018) ‘Identification of microbial carotenoids and isoprenoid quinones from Rhodococcus sp. B7740 and its stability in the presence of iron in model gastric conditions’, Food Chemistry, 240, pp. 204-211.

Chowdhury, K. et al. (2018) ‘Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides’ Gene, 639(1), pp. 85-95.

Dobson, A. et al. (2012) ‘Bacteriocin production: a probiotic trait?’, Applied and Environment Microbiology, 78(1), pp. 1-6.

Figueras, M. et al. (2014) ‘Taxonomic affiliation of new genomes should be verified using average nucleotide identity and multilocus phylogenetic analysis’, Genome Announcements, 2(6), pp. 1-14.

Fraser, R.S. and Creanor, J. (1975) ‘The mechanism of inhibition of ribonucleic acid synthesis by 8-hydroxyquinoline and the antibiotic lomofungin’, Biochemical Journal, 147(3), pp. 401-410.

Gao, L. et al. (2017) ‘ Plipastatin and surfactin coproduction by Bacillus subtilis pB2-L and their effects on microorganisms’, Antonie Van Leeuwenhoek, 110(8), pp. 1007-1018.

Haley, B. et al. (2010) ‘Comparative genomic analysis reveals evidence of two novel Vibrio species closely related to V. cholera, BMC Microbiology, 10(1), p. 154-163.

Jean, G., Radulescu, A. and Rusu, I. (2017) ‘The contig assembly problem and its algorithmic solutions’, in Elloumi, M. (eds.) Algorithms for Next-Generation Sequencing Data. Cham, Switzerland: Springer, pp. 267-298.

Jebbar, M. et al. (1992) ‘Osmoprotection of Escherichia coli by ectoine: uptake and accumulation characteristics’, Journal of Bacteriology, 174(15), pp. 5027-5035.

Jenke-Kodama, H. et al. (2005) ‘Evolutionary implications of bacterial polyketide synthases’, Molecular Biology and Evolution, 22(10), pp. 2027-2039.

Kumar, S. et al. (2020) ‘Data on genome sequencing, assembly, annotation and genomic analysis of Rhodococcus rhodochrous strain SPC17 isolated from Lonar Lake’, Data in Brief, 29(1) p. 105336-105340.

Lautru, S. et al. (2005) ‘Discovery of a new peptide natural product by Streptomyces coelicolor genome mining’, Nature Chemical Biology, 1(5), pp. 265-269.

Lazos, O. et al. (2010) ‘Biosynthesis of the putative siderophore erythrochelin requires unprecedented crosstalk between separate nonribosomal peptide gene clusters’, Chemistry & Biology, 17(2), pp. 160-173.

McLeod, M. et al. (2006) ‘The complete genome of Rhodococcus sp. RHA1 provides insights into a catabolic powerhouse’, Proceedings of the National Academy of Sciences, 103(42), pp. 15582-15587.

Müller, C. et al. (2015) ‘Rhodococcus erythropolis BG43 genes mediating Pseudomonas aeruginosa quinolone signal degradation and virulence factor attenuation’, Applied and Environment Microbiology, 81(22), pp. 7720-7729.

Nakao, M. et al. (2016) ‘Synthesis of Erythrochelin: A hydroxamate-type siderophore from Saccharopolyspora Erythraeam’, Synthesis, 48(23), pp. 4149-4154.

Ouellette, M. et al. (2018) ‘Characterizing the DNA methyltransferases of Haloferax volcanii via bioinformatics, gene deletion, and SMRT sequencing’, Genes, 9(3), p.129-152.

Paten, B. et al. (2017) ‘Genome graphs and the evolution of genome inference’, Genome Research, 27(5), pp. 665-676.

Pickens, L. et al. (2009) ‘Biochemical analysis of the biosynthetic pathway of an anticancer tetracycline SF2575’, Journal of the American Chemical Society, 131(48), pp. 17677-17689.

Raes, J. et al. (2007) ‘Prediction of effective genome size in metagenomic samples’, Genome Biology, 8(1), p. 1-11.

Sela, I., Wolf, Y.I. and Koonin, E.V. (2016) ‘Theory of prokaryotic genome evolution’, Proceedings of the National Academy of Sciences, 113(41), pp. 11399-11407.

Stachelhaus, T. and Marahiel, M.A. (1995) ‘Modular structure of genes encoding multifunctional peptide synthetases required for non-ribosomal peptide synthesis’, FEMS Microbiology Letters, 125(1), pp. 3-14.

Švec, P. et al. (2015) ‘Classification of strain CCM 4446T as Rhodococcus degradans sp. nov’, International Journal of Systematic and Evolutionary Microbiology, 65(12), pp. 4381-4387.

Tholl, D. (2006) ‘Terpene synthases and the regulation, diversity and biological roles of terpene metabolism’, Current Opinion in Plant Biology, 9(3), pp. 297-304.

Yamada, K. and Komagata, K. (1972) ‘Taxonomic studies on coryneform bacteria’, The Journal of General and Applied Microbiology, 18(6), pp. 417-431.

Walport, L.J. and Schofield, C.J. (2018) ‘Adventures in defining roles of oxygenases in the regulation of protein biosynthesis’, The Chemical Record, 18(12), pp. 1760-1781.

Wang, L. et al. (2017) ‘Diisonitrile natural product SF2768 functions as a chalkophore that mediates copper acquisition in Streptomyces thioluteus’, ACS Chemical Biology, 12(12), pp. 3067-3075.

Xiao, Y.S. et al. (2017) ‘Rifamorpholines A–E, potential antibiotics from locust-associated actinobacteria Amycolatopsis sp. Hca4’, Organic & Biomolecular Chemistry, 15(18), pp. 3909-3916.

Zacharof, M.P. and Lovitt, R.W. (2012) ‘Bacteriocins produced by lactic acid bacteria a review article’, Apcbee Procedia, 2(1), pp. 50-56.