Working towards a typology of indices of agreement for clustering evaluation
13/12/2018 Thursday 13th December 2018, 14:00 (Room P3.10, Mathematics Building)
Margarida G. M. S. Cardoso, Instituto Universitário de Lisboa (ISCTE-IUL), Business Research Unit (BRU-IUL)
Indices of agreement (IA) are commonly used to evaluate stability of a clustering solution or its agreement with ground truth – internal and external validation of the same solution, respectively.
IA provide different measures of the accordance between two partitions of the same data set, being based on contingency table data. Despite their frequent use in clustering evaluation, there are still open issues regarding the specific thresholds for each index to conclude about the degree of agreement between the partitions.
To acquire new insights on the indices behavior that may help improve clustering evaluation, 14 paired indices of indices are analyzed within diverse experimental scenarios - with balanced or unbalanced clusters and poorly, moderately or well separated ones. The paired indices’ observed values are all based on a cross-classification table of counts of pairs of observations both partitions agree to join and/or separate in the clusters. The IADJUST method is used to learn about the behavior of the indices under the hypothesis of agreement between partitions occurring by chance (H0). It relies on the generation of contingency tables under H0, being a simulation based procedure that enables to correct any index of agreement by deducting agreement by chance, overcoming previous limitations of analytical or approximate approaches – (Amorim and Cardoso, 2015).
The results suggest a preliminary typology of paired indices of agreement based on their distributional characteristics under H0. Inter-scenarios symbolic data referring to location, dispersion and shape measures of IA distributions under H0 are used to build this typology.
Amorim, M. J., & Cardoso, M. G. (2015). Comparing clustering solutions: The use of adjusted paired indices. Intelligent Data Analysis, 19(6), 1275-1296.
Joint work with Maria José Amorim (Department of Mathematics of ISEL, Lisbon, Portugal).