Abstract
Bioinformatics has been revolutionizing how scientists analyze and interpret genetic data. Thus, this review highlights the fundamental role of bioinformatics tools in understanding genomic data. The article explores the diversity of software and algorithms available for processing, analyzing, and interpreting genetic data. It addresses the relevance of these tools in identifying genes, genetic variations, predicting protein structures, and evolutionary and phylogenetic studies. Additionally, the challenges faced in bioinformatics, including integrating data from different sources, standardization, and interpreting results, are discussed. The article provides information on sequence alignment, sequencing data cleaning, which are crucial when working with genetic datasets. It is further emphasized that discussions like these are important because bioinformatics tools are constantly evolving, requiring researchers to continuously update their knowledge and skills.
References
- ABDELKRIM, R. Bioinformatics: An Exciting Field of Science-Importance and Applications. Journal of Concepts in Structural Biology & Bioinformatics (JSBB), v. 1, n. 4, 2023.
- ARONSON, S. J.; REHM, H. L. Building the foundation for genomics in precision medicine. Nature, v. 526, n. 7573, p. 336-342, 2015.
- BELKADI, A. et al. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proceedings of the National Academy of Sciences, v. 112, n. 17, p. 5473-5478, 2015.
- BOCK, C. Analysing and interpreting DNA methylation data. Nature Reviews Genetics, v. 13, n. 10, p. 705-719, 2012.
- BOUCKAERT, R. et al. BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLOS Computational Biology, v. 10, n. 4, p. e1003537, 2014.
- CERVANTES-PÉREZ, S. A. et al. Challenges and perspectives in applying single nuclei RNA-seq technology in plant biology. Plant Science, v. 325, p. 111486, 2022.
- GUINDON, S. et al. Estimating Maximum Likelihood Phylogenies with PhyML. Em: POSADA, D. (Ed.). Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology. Totowa, NJ: Humana Press, p. 113–137, 2009.
- IQBAL, N.; KUMAR, P. From Data Science to Bioscience: Emerging era of bioinformatics applications, tools and challenges. Procedia Computer Science, v. 218, p. 1516-1528, 2023.
- JO, H.; KOH, G. Faster single-end alignment generation utilizing multi-thread for BWA. Bio-medical materials and engineering, v. 26, n. s1, p. S1791-S1796, 2015.
- KANEHISA, M. The KEGG database. In: ‘In silico’simulation of biological processes: Novartis Foundation Symposium 247. Chichester, UK: John Wiley & Sons, Ltd, 2002. p. 91-103.
- KANZI, A. M. et al. Next generation sequencing and bioinformatics analysis of family genetic inheritance. Frontiers in Genetics, v. 11, p. 544162, 2020.
- KATOH, K.; ASIMENOS, G.; TOH, H. Multiple alignment of DNA sequences with MAFFT. Bioinformatics for DNA sequence analysis, p. 39-64, 2009.
- LAM, H. Y. K. et al. Detecting and annotating genetic variations using the HugeSeq pipeline. Nature biotechnology, v. 30, n. 3, p. 226-229, 2012.
- LAM, H-M. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet 42, 1053–1059, 2010.
- LAN, K. et al. A survey of data mining and deep learning in bioinformatics. Journal of medical systems, v. 42, p. 1-20, 2018.
- LARSSON, O.; WAHLESTEDT, C.; TIMMONS, J. A. Considerations when using the significance analysis of microarrays (SAM) algorithm. BMC bioinformatics, v. 6, n. 1, p. 1-6, 2005.
- LARTILLOT, N. PhyloBayes: Bayesian Phylogenetics Using Site-heterogeneous Models. IN: SCORNAVACCA, C.; DELSUC, F.; GALTIER, N. (Eds.). Phylogenetics in the Genomic Era. p. 1.5:1-1.5:16, 2020.
- LEE, T.-H. et al. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics 15, 162, 2014.
- LI, L.; STOECKERT, C. J.; ROOS, D. S. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Research, v. 13, n. 9, p. 2178–2189, 2003.
- LIU, C. et al. The community coevolution model with application to the study of evolutionary relationships between genes based on phylogenetic profiles. Systematic Biology, v. 72, n. 3, p. 559-574, 2023.
- LIU, S. et al. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. JoVE (Journal of Visualized Experiments), n. 175, p. e62528, 2021.
- MITRA, K. et al. Integrative approaches for finding modular structure in biological networks. Nature Reviews Genetics, v. 14, n. 10, p. 719-732, 2013.
- PABINGER, S. et al. A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in bioinformatics, v. 15, n. 2, p. 256-278, 2014.
- PEARSON, W. R. BLAST and FASTA similarity searching for multiple sequence alignment. Multiple sequence alignment methods, p. 75-101, 2014.
- PEREIRA, R.; OLIVEIRA, J.; SOUSA, M. Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics. Journal of clinical medicine, v. 9, n. 1, p. 132, 2020.
- POND, S. L. K.; FROST, S. D. W.; MUSE, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics, v. 21, n. 5, p. 676–679, 2005.
- RAO, M. S. et al. Comparison of RNA-Seq and microarray gene expression platforms for the toxicogenomic evaluation of liver from short-term rat toxicity studies. Frontiers in genetics, v. 9, p. 636, 2019.
- SHAKYA, M. et al. Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life. Sci Rep 10, 1723, 2020.
- SPENCER, D. H. et al. Performance of common analysis methods for detecting low-frequency single nucleotide variants in targeted next-generation sequence data. The Journal of Molecular Diagnostics, v. 16, n. 1, p. 75-88, 2014.
- THOMPSON, D.; REGEV, A.; ROY, S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annual review of cell and developmental biology, v. 31, p. 399-428, 2015.
- TRAPNELL, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature biotechnology, v. 31, n. 1, p. 46-53, 2013.
- VARET, H. et al. SARTools: a DESeq2-and EdgeR-based R pipeline for comprehensive differential analysis of RNA-Seq data. PloS one, v. 11, n. 6, p. e0157022, 2016.
- WHITE, M. H.; ADAMS, D. A.; BU, J. On the go with SONOS. IEEE Circuits and Devices Magazine, v. 16, n. 4, p. 22-31, 2000.
- YANG, Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Molecular Biology and Evolution, v. 24, n. 8, p. 1586–1591, 2007.
- YAO, Z. et al. Evaluation of variant calling tools for large plant genome re-sequencing. BMC bioinformatics, v. 21, n. 1, p. 1-16, 2020.