Feature selection using Decomposed Mutual Information Maximization
Macedo, Francisco; Valadas, Rui; Carrasquinha, E.; Oliveira, M. Rosário; Pacheco, António
Neurocomputing, 513 (2022), 215-232
Feature selection has been recognised for long as an important preprocessing technique to reduce dimensionality and improve the performance of regression and classification tasks. The class of sequential forward feature selection methods based on Mutual Information (MI) is widely used in practice, mainly due to its computational efficiency and independence from the specific classifier. A recent work introduced a theoretical framework for this class of methods which explains the existing proposals as approximations to an optimal target objective function. Such framework made clear the advantages and drawbacks of each proposal. Methods accounting for the redundancy of candidate features using a maximisation function and considering the so-called complementary effect are among the best ones. However, they still penalize the complementarity, which is an important drawback.
This paper proposes the Decomposed Mutual Information Maximisation (DMIM) method, which keeps the good theoretical properties of the best methods proposed so far but overcomes the complementarity penalisation by applying the maximisation separately to the inter-feature and class-relevant redundancies. DMIM was extensively evaluated and compared with other methods, both theoretically and using two synthetic scenarios and 20 publicly available real datasets applied to specific classifiers. Our results show that DMIM achieves a better classification performance than the remaining forward feature selection methods based on MI.