Application of K-Means Clustering Statistical Model in DNA Base Frequency Distribution Analysis

Authors

  • luthfie Budie Fakultas Sains dan Teknologi, Ilmu Komputer, Univeristas Islam Negeri Sumatera Utara, Indonesia Author
  • Syaputra Ervian Teknik Informatika, Universitas Sinar Husni Medan, Indonesia Author

Keywords:

Bioinformatics, Statistics, Genomics

Abstract

For effective analyses, reliable statistical models are required as rapid advances in genomics have generated complex and large data. Various aspects of genomics, including evolutionary analyses, Genome-Wide Association Studies (GWAS), transcriptomics, reconstruction of gene regulatory networks, and statistical models, are essential. Researchers can identify genetic variants associated with diseases using these models, analyse gene expression patterns, and predict phenotypes using genetic data. To interpret genomic data and deal with problems such as noise, high dimensionality, and multiple testing, techniques such as machine learning in classification, prediction and clustering of data and one such method is K-Means Clustering. This algorithm is used to cluster genomic data based on the similarity of statistical characteristics such as DNA sequences. This improves our understanding of genetic mechanisms and how they can be utilised in the clinical world. This article shows how important statistical models are in genomics.

Downloads

Published

2025-07-31