When experience matters: Developing new probabilistic Bayesian cluster validity indices
Principal Investigator
Co-Investigators
No matching items found.
Other Team Members
No matching items found.
Project details
Start date: 25/04/2023
End date: 24/04/2025
Abstract
Cluster analysis is one of the most well-known unsupervisedlearning tools in statistical and machine learningused for splitting observations into groups with similar characteristics. Researchers apply it to solve problems in diverse fields, ranging from social science to astronomer. There are various cluster validity indices used for evaluating clustering results. One of the main objectives of using these indices is to seek the optimal unknown number of clusters. Recently, [Wiroonsri21] (https://github.com/nwiroonsri/NCvalid/) introduced a new correlation-based cluster validity index which can accurately detect the optimal number of clusters and always provide information about sub-optimal numbers of clusters. This allows the user to rank several sub-optimal options on hand. However, this index is compatible only with k-means and hierarchical clustering and is solely based on data without any experience involved. In this work, we propose a new fuzzy cluster validity index (FNCI) and a new Bayesian cluster validity index (BNCI) based on the NC index (NCI) introduced by [Wiroonsri21]. FNCI will be compatible with fuzzy c-means, EM algorithm and other more modern methods that provide probabilistic memberships. For BNCI, we will use a Dirichlet prior for the optimal number of clusters candidates where the user can set its parameters based on experience in his/her context. The posterior distribution will remain a Dirichlet distribution with parameters changed with respect to the data. Therefore, the final optimal number of clusters will be based on both data and either knowledge or experience. Beside defining FNCI and BNCI and analyze their mathematical properties, we plan to apply them to Thai population dataset by adding sociological knowledge to adjust parameters of BNCI. Moreover, we also plan to cluster distinct graphs using their features in graph theory and apply the indices to select a final number of clusters.
Keywords
- Bayesian
- Cluster Validity Index
- Correlation
- Probabilistic Machine Learning
- Statistical Learning
Strategic Research Themes
Publications
No matching items found.