When experience matters: Developing new probabilistic Bayesian cluster validity indices


Principal Investigator


Co-Investigators

No matching items found.


Other Team Members

No matching items found.


Project details

Start date25/04/2023

End date24/04/2025


Abstract

Cluster analysis is one of the most well-known unsupervisedlearning tools in statistical and machine learningused for splitting observations into groups with similar characteristics. Researchers apply it to solve problems in diverse fields, ranging from social science to astronomer. There are various cluster validity indices used for evaluating clustering results. One of the main objectives of using these indices is to seek the optimal unknown number of clusters. Recently, [Wiroonsri21] (https://github.com/nwiroonsri/NCvalid/) introduced a new correlation-based cluster validity index which can accurately detect the optimal number of clusters and always provide information about sub-optimal numbers of clusters. This allows the user to rank several sub-optimal options on hand. However, this index is compatible only with k-means and hierarchical clustering and is solely based on data without any experience involved. In this work, we propose a new fuzzy cluster validity index (FNCI) and a new Bayesian cluster validity index (BNCI) based on the NC index (NCI) introduced by [Wiroonsri21]. FNCI will be compatible with fuzzy c-means, EM algorithm and other more modern methods that provide probabilistic memberships. For BNCI, we will use a Dirichlet prior for the optimal number of clusters candidates where the user can set its parameters based on experience in his/her context. The posterior distribution will remain a Dirichlet distribution with parameters changed with respect to the data. Therefore, the final optimal number of clusters will be based on both data and either knowledge or experience. Beside defining FNCI and BNCI and analyze their mathematical properties, we plan to apply them to Thai population dataset by adding sociological knowledge to adjust parameters of BNCI. Moreover, we also plan to cluster distinct graphs using their features in graph theory and apply the indices to select a final number of clusters.


Keywords

  • Bayesian
  • Cluster Validity Index
  • Correlation
  • Probabilistic Machine Learning
  • Statistical Learning


Strategic Research Themes


Publications

No matching items found.


Last updated on 2025-08-07 at 14:10