Efficient variation-based feature selection for medical data classification
Conference proceedings article
Authors/Editors
Strategic Research Themes
No matching items found.
Publication Details
Author list: Fong S., Liang J., Siu S.W.I., Chan J.H.
Publisher: American Scientific Publishers
Publication year: 2015
Volume number: 5
Issue number: 5
Start page: 1093
End page: 1098
Number of pages: 6
ISSN: 2156-7018
eISSN: 2156-7026
Languages: English-Great Britain (EN-GB)
View in Web of Science | View on publisher site | View citing articles in Web of Science
Abstract
Medical data which collected from sophisticated and sometimes different instruments are described by a large number of feature variables and a historical archive of patients' records, known as multivariate medical dataset. In biomedical data mining, classification model is often built upon such dataset for predicting which particular type of disease that a new instance of record belongs to. One of the challenges in inferring a classification model with good prediction accuracy is to select the relevant features that contribute to maximum predictive power. Many feature selection techniques have been proposed and studied in the past, but none so far claimed to be the best. In this paper, an efficient feature selection method called Clustering Coefficients of Variation (CCV) is applied. CCV is based on a very simple principle of variance-bias which optimally balances the model training between generalization and over-fitting. Through a computer simulation experiment, eleven medical datasets with a substantially large number of features are tested by CCV in comparison to four popular feature selection techniques. Results show that CCV outperformed them in all aspects of averaged performances and speed. By the simplicity of design it is anticipated that CCV will be a useful feature selection method for classifying medical data especially those datasets that are characterized by many features. Copyright ฉ 2015 American Scientific Publishers All rights reserved.
Keywords
Medical Datasets