K.H. Vani 1 , J.Rathika2
Abstract : Feature selection and classification have been historically utilized in a variety of domains like business, media and medical. Mining data and its classification is extremely difficult due to the nature of high dimensional. There is a high level of complexity in big data, which makes it challenging to achieve a standard feature selection approach. The irrelevant and redundant characteristics will adversely affect the computational complexity and workflow of classification algorithms. The most common existing classification algorithms intakes all the features. However, all features are not useful in the classifier and it leads the results to subpar. Hence, there exists a need for optimization in selection features for performing classification. In this paper, Local Search based Genetic Algorithm for Feature Selection (LSGNFS) is proposed for performing classification with health big data. Genetic algorithm is modified to perform a local search. Using the local search strategy, the calculated correlation information yields unique and significant input characteristics. The purpose is to help direct the search process in such a way that freshly generated features may be fine-tuned by the features that are characterized by general and specific qualities. This helps to limit the amount of duplicated information the LSGNFS possesses by supplying just the requested features. Performance of LSGNFS is analyzed using standard data mining metrics Accuracy and F-measure with 3 health big data set namely (i) coronary heart disease dataset (ii) diabetes disease dataset and (iii) bronchial tuberculosis disease dataset. Results make an indication that LSGNFS performs better than the existing classifier and well suited for performing classification in big data.
Keyword : Big Data, Classification, Feature Selection, Genetic Algorithm, Health.