Study of Discretization Techniques in the context of Distributed Genetic Classifiers

Authors

  1. Paper abstract
  2. Discretized data sets

    5fcv partitions for the 42 data sets used in the paper, all of them discretized with all the methods presented in the paper.

  3. Discretized data sets with SMOTE

    5fcv partitions for the 42 data sets used in the paper, all of them discretized with all the methods presented in the paper and balanced using SMOTE procedure.

  4. Data sets indexes

    Results of the indexes presented in the paper for each of the data sets.

  5. Algorithm results

    Results obtained for Distributed and Non-distributed algorithms with all the discretizers used in the paper

  6. Study of the Discretization Techniques and Non-Distributed Classifiers

    Brief description of the results obtained for a set of Non-distributed classifiers with all the discretizers used in the paper

  7. Discretizer results

    Results obtained for all the discretizers with distributed classifiers used in the paper and a set of Non-distributed classifiers.

1. Abstract

Since most real-world applications of classification learning involve continuous valued attributes, the discretization process is known to be one of the most important pre-processing or data reduction steps. Properly addressing the discretization process is an important issue with a great impact on classification algorithm performance. In this paper, in the context of data mining and classification tasks, the interrelationships among the three elements that make up the 3-tuple [data set, discretizer, classification algorithm] are analysed. We studied the impact of the discretization process on both datasets and the classi cation task. We are especially concerned with distributed genetic algorithms and the development of a method based on data complexity measures to find the right discretizer with a previously chosen competent classifier for the problem at hand. Despite the specific and concrete results obtained in this research, from the empirical evidence we could extend the scope of no free lunch theorem to the discretization process, because with no prior information there is no better discretizer for all types of data and the use of one or another depends on the dataset complexity and the features of the classification algorithm.

2. Discretized data sets

We considered forty two data sets from UCI with different ratios between the minority and majority classes: from low imbalance to highly imbalanced datasets. Multiclass data sets were modified to obtain two-class non-balanced problems, so that the union of one or more classes became the positive class and the union of one or more of the remaining classes was labelled as the negative class. Table 1 and Table 2 summarizes, respectively, the imbalanced data and balanced data using SMOTE employed in the paper and shows, for each data set, the number of attributes (#Atts.), number of examples (#Ex.), (IR) Imbalance rate and one column for each discretizer. These tables are arranged according to the IR column, from low to high imbalance.

Table 1

Data set #Atts. (R/I/N) #Ex. IR CADD CAIM ChiMerge Chi2Merge MDLP ID3 USD
glass19 (9/0/0)2141.82CADD_glass1.zipCAIM_glass1.zipChiMerge_glass1.zipChi2Merge_glass1.zipMDLP_glass1.zipID3_glass1.zipUSD_glass1.zip
pima8 (8/0/0)7681.87CADD_pima.zipCAIM_pima.zipChiMerge_pima.zipChi2Merge_pima.zipMDLP_pima.zipID3_pima.zipUSD_pima.zip
glass09 (9/0/0)2142.06CADD_glass0.zipCAIM_glass0.zipChiMerge_glass0.zipChi2Merge_glass0.zipMDLP_glass0.zipID3_glass0.zipUSD_glass0.zip
yeast18 (8/0/0)14842.46CADD_yeast1.zipCAIM_yeast1.zipChiMerge_yeast1.zipChi2Merge_yeast1.zipMDLP_yeast1.zipID3_yeast1.zipUSD_yeast1.zip
haberman3 (0/3/0)3062.78CADD_haberman.zipCAIM_haberman.zipChiMerge_haberman.zipChi2Merge_haberman.zipMDLP_haberman.zipID3_haberman.zipUSD_haberman.zip
vehicle118 (0/18/0)8462.9CADD_vehicle1.zipCAIM_vehicle1.zipChiMerge_vehicle1.zipChi2Merge_vehicle1.zipMDLP_vehicle1.zipID3_vehicle1.zipUSD_vehicle1.zip
vehicle318 (0/18/0)8462.99CADD_vehicle3.zipCAIM_vehicle3.zipChiMerge_vehicle3.zipChi2Merge_vehicle3.zipMDLP_vehicle3.zipID3_vehicle3.zipUSD_vehicle3.zip
vehicle018 (0/18/0)8463.25CADD_vehicle0.zipCAIM_vehicle0.zipChiMerge_vehicle0.zipChi2Merge_vehicle0.zipMDLP_vehicle0.zipID3_vehicle0.zipUSD_vehicle0.zip
ecoli17 (7/0/0)3363.36CADD_ecoli1.zipCAIM_ecoli1.zipChiMerge_ecoli1.zipChi2Merge_ecoli1.zipMDLP_ecoli1.zipID3_ecoli1.zipUSD_ecoli1.zip
ecoli27 (7/0/0)3365.46CADD_ecoli2.zipCAIM_ecoli2.zipChiMerge_ecoli2.zipChi2Merge_ecoli2.zipMDLP_ecoli2.zipID3_ecoli2.zipUSD_ecoli2.zip
yeast38 (8/0/0)14848.1CADD_yeast3.zipCAIM_yeast3.zipChiMerge_yeast3.zipChi2Merge_yeast3.zipMDLP_yeast3.zipID3_yeast3.zipUSD_yeast3.zip
ecoli37 (7/0/0)3368.6CADD_ecoli3.zipCAIM_ecoli3.zipChiMerge_ecoli3.zipChi2Merge_ecoli3.zipMDLP_ecoli3.zipID3_ecoli3.zipUSD_ecoli3.zip
page-blocks010 (4/6/0)54728.79CADD_page-blocks0.zipCAIM_page-blocks0.zipChiMerge_page-blocks0.zipChi2Merge_page-blocks0.zipMDLP_page-blocks0.zipID3_page-blocks0.zipUSD_page-blocks0.zip
ecoli-0-3-4vs57 (7/0/0)2009CADD_ecoli-0-3-4vs5.zipCAIM_ecoli-0-3-4vs5.zipChiMerge_ecoli-0-3-4vs5.zipChi2Merge_ecoli-0-3-4vs5.zipMDLP_ecoli-0-3-4vs5.zipID3_ecoli-0-3-4vs5.zipUSD_ecoli-0-3-4vs5.zip
yeast-2vs48 (8/0/0)5149.08CADD_yeast-2vs4.zipCAIM_yeast-2vs4.zipChiMerge_yeast-2vs4.zipChi2Merge_yeast-2vs4.zipMDLP_yeast-2vs4.zipID3_yeast-2vs4.zipUSD_yeast-2vs4.zip
ecoli-0-6-7vs3-57 (7/0/0)2229.09CADD_ecoli-0-6-7vs3-5.zipCAIM_ecoli-0-6-7vs3-5.zipChiMerge_ecoli-0-6-7vs3-5.zipChi2Merge_ecoli-0-6-7vs3-5.zipMDLP_ecoli-0-6-7vs3-5.zipID3_ecoli-0-6-7vs3-5.zipUSD_ecoli-0-6-7vs3-5.zip
ecoli-0-2-3-4vs57 (7/0/0)2029.1CADD_ecoli-0-2-3-4vs5.zipCAIM_ecoli-0-2-3-4vs5.zipChiMerge_ecoli-0-2-3-4vs5.zipChi2Merge_ecoli-0-2-3-4vs5.zipMDLP_ecoli-0-2-3-4vs5.zipID3_ecoli-0-2-3-4vs5.zipUSD_ecoli-0-2-3-4vs5.zip
yeast-0-3-5-9vs7-88 (8/0/0)5069.12CADD_yeast-0-3-5-9vs7-8.zipCAIM_yeast-0-3-5-9vs7-8.zipChiMerge_yeast-0-3-5-9vs7-8.zipChi2Merge_yeast-0-3-5-9vs7-8.zipMDLP_yeast-0-3-5-9vs7-8.zipID3_yeast-0-3-5-9vs7-8.zipUSD_yeast-0-3-5-9vs7-8.zip
yeast-0-2-5-6vs3-7-8-98 (8/0/0)10049.14CADD_yeast-0-2-5-6vs3-7-8-9.zipCAIM_yeast-0-2-5-6vs3-7-8-9.zipChiMerge_yeast-0-2-5-6vs3-7-8-9.zipChi2Merge_yeast-0-2-5-6vs3-7-8-9.zipMDLP_yeast-0-2-5-6vs3-7-8-9.zipID3_yeast-0-2-5-6vs3-7-8-9.zipUSD_yeast-0-2-5-6vs3-7-8-9.zip
yeast-0-2-5-7-9vs3-6-88 (8/0/0)10049.14CADD_yeast-0-2-5-7-9vs3-6-8.zipCAIM_yeast-0-2-5-7-9vs3-6-8.zipChiMerge_yeast-0-2-5-7-9vs3-6-8.zipChi2Merge_yeast-0-2-5-7-9vs3-6-8.zipMDLP_yeast-0-2-5-7-9vs3-6-8.zipID3_yeast-0-2-5-7-9vs3-6-8.zipUSD_yeast-0-2-5-7-9vs3-6-8.zip
ecoli-0-4-6vs56 (6/0/0)2039.15CADD_ecoli-0-4-6vs5.zipCAIM_ecoli-0-4-6vs5.zipChiMerge_ecoli-0-4-6vs5.zipChi2Merge_ecoli-0-4-6vs5.zipMDLP_ecoli-0-4-6vs5.zipID3_ecoli-0-4-6vs5.zipUSD_ecoli-0-4-6vs5.zip
ecoli-0-1vs2-3-57 (7/0/0)2449.17CADD_ecoli-0-1vs2-3-5.zipCAIM_ecoli-0-1vs2-3-5.zipChiMerge_ecoli-0-1vs2-3-5.zipChi2Merge_ecoli-0-1vs2-3-5.zipMDLP_ecoli-0-1vs2-3-5.zipID3_ecoli-0-1vs2-3-5.zipUSD_ecoli-0-1vs2-3-5.zip
ecoli-0-2-6-7vs3-57 (7/0/0)2249.18CADD_ecoli-0-2-6-7vs3-5.zipCAIM_ecoli-0-2-6-7vs3-5.zipChiMerge_ecoli-0-2-6-7vs3-5.zipChi2Merge_ecoli-0-2-6-7vs3-5.zipMDLP_ecoli-0-2-6-7vs3-5.zipID3_ecoli-0-2-6-7vs3-5.zipUSD_ecoli-0-2-6-7vs3-5.zip
ecoli-0-3-4-6vs57 (7/0/0)2059.25CADD_ecoli-0-3-4-6vs5.zipCAIM_ecoli-0-3-4-6vs5.zipChiMerge_ecoli-0-3-4-6vs5.zipChi2Merge_ecoli-0-3-4-6vs5.zipMDLP_ecoli-0-3-4-6vs5.zipID3_ecoli-0-3-4-6vs5.zipUSD_ecoli-0-3-4-6vs5.zip
ecoli-0-3-4-7vs5-67 (7/0/0)2579.28CADD_ecoli-0-3-4-7vs5-6.zipCAIM_ecoli-0-3-4-7vs5-6.zipChiMerge_ecoli-0-3-4-7vs5-6.zipChi2Merge_ecoli-0-3-4-7vs5-6.zipMDLP_ecoli-0-3-4-7vs5-6.zipID3_ecoli-0-3-4-7vs5-6.zipUSD_ecoli-0-3-4-7vs5-6.zip
yeast-0-5-6-7-9vs48 (8/0/0)5289.35CADD_yeast-0-5-6-7-9vs4.zipCAIM_yeast-0-5-6-7-9vs4.zipChiMerge_yeast-0-5-6-7-9vs4.zipChi2Merge_yeast-0-5-6-7-9vs4.zipMDLP_yeast-0-5-6-7-9vs4.zipID3_yeast-0-5-6-7-9vs4.zipUSD_yeast-0-5-6-7-9vs4.zip
vowel013 (10/3/0)9889.98CADD_vowel0.zipCAIM_vowel0.zipChiMerge_vowel0.zipChi2Merge_vowel0.zipMDLP_vowel0.zipID3_vowel0.zipUSD_vowel0.zip
ecoli-0-1-4-7vs2-3-5-67 (7/0/0)33610.59CADD_ecoli-0-1-4-7vs2-3-5-6.zipCAIM_ecoli-0-1-4-7vs2-3-5-6.zipChiMerge_ecoli-0-1-4-7vs2-3-5-6.zipChi2Merge_ecoli-0-1-4-7vs2-3-5-6.zipMDLP_ecoli-0-1-4-7vs2-3-5-6.zipID3_ecoli-0-1-4-7vs2-3-5-6.zipUSD_ecoli-0-1-4-7vs2-3-5-6.zip
glass29 (9/0/0)21411.59CADD_glass2.zipCAIM_glass2.zipChiMerge_glass2.zipChi2Merge_glass2.zipMDLP_glass2.zipID3_glass2.zipUSD_glass2.zip
ecoli-0-1-4-7vs5-66 (6/0/0)33212.28CADD_ecoli-0-1-4-7vs5-6.zipCAIM_ecoli-0-1-4-7vs5-6.zipChiMerge_ecoli-0-1-4-7vs5-6.zipChi2Merge_ecoli-0-1-4-7vs5-6.zipMDLP_ecoli-0-1-4-7vs5-6.zipID3_ecoli-0-1-4-7vs5-6.zipUSD_ecoli-0-1-4-7vs5-6.zip
ecoli-0-1-4-6vs56 (6/0/0)28013CADD_ecoli-0-1-4-6vs5.zipCAIM_ecoli-0-1-4-6vs5.zipChiMerge_ecoli-0-1-4-6vs5.zipChi2Merge_ecoli-0-1-4-6vs5.zipMDLP_ecoli-0-1-4-6vs5.zipID3_ecoli-0-1-4-6vs5.zipUSD_ecoli-0-1-4-6vs5.zip
glass49 (9/0/0)21415.47CADD_glass4.zipCAIM_glass4.zipChiMerge_glass4.zipChi2Merge_glass4.zipMDLP_glass4.zipID3_glass4.zipUSD_glass4.zip
ecoli47 (7/0/0)33615.8CADD_ecoli4.zipCAIM_ecoli4.zipChiMerge_ecoli4.zipChi2Merge_ecoli4.zipMDLP_ecoli4.zipID3_ecoli4.zipUSD_ecoli4.zip
page-blocks-1-3vs410 (4/6/0)47215.86CADD_page-blocks-1-3vs4.zipCAIM_page-blocks-1-3vs4.zipChiMerge_page-blocks-1-3vs4.zipChi2Merge_page-blocks-1-3vs4.zipMDLP_page-blocks-1-3vs4.zipID3_page-blocks-1-3vs4.zipUSD_page-blocks-1-3vs4.zip
abalone9-188 (7/0/1)73116.4CADD_abalone9-18.zipCAIM_abalone9-18.zipChiMerge_abalone9-18.zipChi2Merge_abalone9-18.zipMDLP_abalone9-18.zipID3_abalone9-18.zipUSD_abalone9-18.zip
glass-0-1-6vs59 (9/0/0)18419.44CADD_glass-0-1-6vs5.zipCAIM_glass-0-1-6vs5.zipChiMerge_glass-0-1-6vs5.zipChi2Merge_glass-0-1-6vs5.zipMDLP_glass-0-1-6vs5.zipID3_glass-0-1-6vs5.zipUSD_glass-0-1-6vs5.zip
glass59 (9/0/0)21422.78CADD_glass5.zipCAIM_glass5.zipChiMerge_glass5.zipChi2Merge_glass5.zipMDLP_glass5.zipID3_glass5.zipUSD_glass5.zip
yeast-2vs88 (8/0/0)48223.1CADD_yeast-2vs8.zipCAIM_yeast-2vs8.zipChiMerge_yeast-2vs8.zipChi2Merge_yeast-2vs8.zipMDLP_yeast-2vs8.zipID3_yeast-2vs8.zipUSD_yeast-2vs8.zip
yeast48 (8/0/0)148428.1CADD_yeast4.zipCAIM_yeast4.zipChiMerge_yeast4.zipChi2Merge_yeast4.zipMDLP_yeast4.zipID3_yeast4.zipUSD_yeast4.zip
yeast58 (8/0/0)148432.73CADD_yeast5.zipCAIM_yeast5.zipChiMerge_yeast5.zipChi2Merge_yeast5.zipMDLP_yeast5.zipID3_yeast5.zipUSD_yeast5.zip
ecoli-0-1-3-7vs2-67 (7/0/0)28139.14CADD_ecoli-0-1-3-7vs2-6.zipCAIM_ecoli-0-1-3-7vs2-6.zipChiMerge_ecoli-0-1-3-7vs2-6.zipChi2Merge_ecoli-0-1-3-7vs2-6.zipMDLP_ecoli-0-1-3-7vs2-6.zipID3_ecoli-0-1-3-7vs2-6.zipUSD_ecoli-0-1-3-7vs2-6.zip
yeast68 (8/0/0)148441.4CADD_yeast6.zipCAIM_yeast6.zipChiMerge_yeast6.zipChi2Merge_yeast6.zipMDLP_yeast6.zipID3_yeast6.zipUSD_yeast6.zip

3. Discretized data sets with SMOTE

Table 2

Data set #Atts. (R/I/N) #Ex. IR CADD CAIM ChiMerge Chi2Merge MDLP ID3 USD
glass19 (9/0/0)2141.82CADD_SMOTE_glass1.zipCAIM_SMOTE_glass1.zipChiMerge_SMOTE_glass1.zipChi2Merge_SMOTE_glass1.zipMDLP_SMOTE_glass1.zipID3_SMOTE_glass1.zipUSD_SMOTE_glass1.zip
pima8 (8/0/0)7681.87CADD_SMOTE_pima.zipCAIM_SMOTE_pima.zipChiMerge_SMOTE_pima.zipChi2Merge_SMOTE_pima.zipMDLP_SMOTE_pima.zipID3_SMOTE_pima.zipUSD_SMOTE_pima.zip
glass09 (9/0/0)2142.06CADD_SMOTE_glass0.zipCAIM_SMOTE_glass0.zipChiMerge_SMOTE_glass0.zipChi2Merge_SMOTE_glass0.zipMDLP_SMOTE_glass0.zipID3_SMOTE_glass0.zipUSD_SMOTE_glass0.zip
yeast18 (8/0/0)14842.46CADD_SMOTE_yeast1.zipCAIM_SMOTE_yeast1.zipChiMerge_SMOTE_yeast1.zipChi2Merge_SMOTE_yeast1.zipMDLP_SMOTE_yeast1.zipID3_SMOTE_yeast1.zipUSD_SMOTE_yeast1.zip
haberman3 (0/3/0)3062.78CADD_SMOTE_haberman.zipCAIM_SMOTE_haberman.zipChiMerge_SMOTE_haberman.zipChi2Merge_SMOTE_haberman.zipMDLP_SMOTE_haberman.zipID3_SMOTE_haberman.zipUSD_SMOTE_haberman.zip
vehicle118 (0/18/0)8462.9CADD_SMOTE_vehicle1.zipCAIM_SMOTE_vehicle1.zipChiMerge_SMOTE_vehicle1.zipChi2Merge_SMOTE_vehicle1.zipMDLP_SMOTE_vehicle1.zipID3_SMOTE_vehicle1.zipUSD_SMOTE_vehicle1.zip
vehicle318 (0/18/0)8462.99CADD_SMOTE_vehicle3.zipCAIM_SMOTE_vehicle3.zipChiMerge_SMOTE_vehicle3.zipChi2Merge_SMOTE_vehicle3.zipMDLP_SMOTE_vehicle3.zipID3_SMOTE_vehicle3.zipUSD_SMOTE_vehicle3.zip
vehicle018 (0/18/0)8463.25CADD_SMOTE_vehicle0.zipCAIM_SMOTE_vehicle0.zipChiMerge_SMOTE_vehicle0.zipChi2Merge_SMOTE_vehicle0.zipMDLP_SMOTE_vehicle0.zipID3_SMOTE_vehicle0.zipUSD_SMOTE_vehicle0.zip
ecoli17 (7/0/0)3363.36CADD_SMOTE_ecoli1.zipCAIM_SMOTE_ecoli1.zipChiMerge_SMOTE_ecoli1.zipChi2Merge_SMOTE_ecoli1.zipMDLP_SMOTE_ecoli1.zipID3_SMOTE_ecoli1.zipUSD_SMOTE_ecoli1.zip
ecoli27 (7/0/0)3365.46CADD_SMOTE_ecoli2.zipCAIM_SMOTE_ecoli2.zipChiMerge_SMOTE_ecoli2.zipChi2Merge_SMOTE_ecoli2.zipMDLP_SMOTE_ecoli2.zipID3_SMOTE_ecoli2.zipUSD_SMOTE_ecoli2.zip
yeast38 (8/0/0)14848.1CADD_SMOTE_yeast3.zipCAIM_SMOTE_yeast3.zipChiMerge_SMOTE_yeast3.zipChi2Merge_SMOTE_yeast3.zipMDLP_SMOTE_yeast3.zipID3_SMOTE_yeast3.zipUSD_SMOTE_yeast3.zip
ecoli37 (7/0/0)3368.6CADD_SMOTE_ecoli3.zipCAIM_SMOTE_ecoli3.zipChiMerge_SMOTE_ecoli3.zipChi2Merge_SMOTE_ecoli3.zipMDLP_SMOTE_ecoli3.zipID3_SMOTE_ecoli3.zipUSD_SMOTE_ecoli3.zip
page-blocks010 (4/6/0)54728.79CADD_SMOTE_page-blocks0.zipCAIM_SMOTE_page-blocks0.zipChiMerge_SMOTE_page-blocks0.zipChi2Merge_SMOTE_page-blocks0.zipMDLP_SMOTE_page-blocks0.zipID3_SMOTE_page-blocks0.zipUSD_SMOTE_page-blocks0.zip
ecoli-0-3-4vs57 (7/0/0)2009CADD_SMOTE_ecoli-0-3-4vs5.zipCAIM_SMOTE_ecoli-0-3-4vs5.zipChiMerge_SMOTE_ecoli-0-3-4vs5.zipChi2Merge_SMOTE_ecoli-0-3-4vs5.zipMDLP_SMOTE_ecoli-0-3-4vs5.zipID3_SMOTE_ecoli-0-3-4vs5.zipUSD_SMOTE_ecoli-0-3-4vs5.zip
yeast-2vs48 (8/0/0)5149.08CADD_SMOTE_yeast-2vs4.zipCAIM_SMOTE_yeast-2vs4.zipChiMerge_SMOTE_yeast-2vs4.zipChi2Merge_SMOTE_yeast-2vs4.zipMDLP_SMOTE_yeast-2vs4.zipID3_SMOTE_yeast-2vs4.zipUSD_SMOTE_yeast-2vs4.zip
ecoli-0-6-7vs3-57 (7/0/0)2229.09CADD_SMOTE_ecoli-0-6-7vs3-5.zipCAIM_SMOTE_ecoli-0-6-7vs3-5.zipChiMerge_SMOTE_ecoli-0-6-7vs3-5.zipChi2Merge_SMOTE_ecoli-0-6-7vs3-5.zipMDLP_SMOTE_ecoli-0-6-7vs3-5.zipID3_SMOTE_ecoli-0-6-7vs3-5.zipUSD_SMOTE_ecoli-0-6-7vs3-5.zip
ecoli-0-2-3-4vs57 (7/0/0)2029.1CADD_SMOTE_ecoli-0-2-3-4vs5.zipCAIM_SMOTE_ecoli-0-2-3-4vs5.zipChiMerge_SMOTE_ecoli-0-2-3-4vs5.zipChi2Merge_SMOTE_ecoli-0-2-3-4vs5.zipMDLP_SMOTE_ecoli-0-2-3-4vs5.zipID3_SMOTE_ecoli-0-2-3-4vs5.zipUSD_SMOTE_ecoli-0-2-3-4vs5.zip
yeast-0-3-5-9vs7-88 (8/0/0)5069.12CADD_SMOTE_yeast-0-3-5-9vs7-8.zipCAIM_SMOTE_yeast-0-3-5-9vs7-8.zipChiMerge_SMOTE_yeast-0-3-5-9vs7-8.zipChi2Merge_SMOTE_yeast-0-3-5-9vs7-8.zipMDLP_SMOTE_yeast-0-3-5-9vs7-8.zipID3_SMOTE_yeast-0-3-5-9vs7-8.zipUSD_SMOTE_yeast-0-3-5-9vs7-8.zip
yeast-0-2-5-6vs3-7-8-98 (8/0/0)10049.14CADD_SMOTE_yeast-0-2-5-6vs3-7-8-9.zipCAIM_SMOTE_yeast-0-2-5-6vs3-7-8-9.zipChiMerge_SMOTE_yeast-0-2-5-6vs3-7-8-9.zipChi2Merge_SMOTE_yeast-0-2-5-6vs3-7-8-9.zipMDLP_SMOTE_yeast-0-2-5-6vs3-7-8-9.zipID3_SMOTE_yeast-0-2-5-6vs3-7-8-9.zipUSD_SMOTE_yeast-0-2-5-6vs3-7-8-9.zip
yeast-0-2-5-7-9vs3-6-88 (8/0/0)10049.14CADD_SMOTE_yeast-0-2-5-7-9vs3-6-8.zipCAIM_SMOTE_yeast-0-2-5-7-9vs3-6-8.zipChiMerge_SMOTE_yeast-0-2-5-7-9vs3-6-8.zipChi2Merge_SMOTE_yeast-0-2-5-7-9vs3-6-8.zipMDLP_SMOTE_yeast-0-2-5-7-9vs3-6-8.zipID3_SMOTE_yeast-0-2-5-7-9vs3-6-8.zipUSD_SMOTE_yeast-0-2-5-7-9vs3-6-8.zip
ecoli-0-4-6vs56 (6/0/0)2039.15CADD_SMOTE_ecoli-0-4-6vs5.zipCAIM_SMOTE_ecoli-0-4-6vs5.zipChiMerge_SMOTE_ecoli-0-4-6vs5.zipChi2Merge_SMOTE_ecoli-0-4-6vs5.zipMDLP_SMOTE_ecoli-0-4-6vs5.zipID3_SMOTE_ecoli-0-4-6vs5.zipUSD_SMOTE_ecoli-0-4-6vs5.zip
ecoli-0-1vs2-3-57 (7/0/0)2449.17CADD_SMOTE_ecoli-0-1vs2-3-5.zipCAIM_SMOTE_ecoli-0-1vs2-3-5.zipChiMerge_SMOTE_ecoli-0-1vs2-3-5.zipChi2Merge_SMOTE_ecoli-0-1vs2-3-5.zipMDLP_SMOTE_ecoli-0-1vs2-3-5.zipID3_SMOTE_ecoli-0-1vs2-3-5.zipUSD_SMOTE_ecoli-0-1vs2-3-5.zip
ecoli-0-2-6-7vs3-57 (7/0/0)2249.18CADD_SMOTE_ecoli-0-2-6-7vs3-5.zipCAIM_SMOTE_ecoli-0-2-6-7vs3-5.zipChiMerge_SMOTE_ecoli-0-2-6-7vs3-5.zipChi2Merge_SMOTE_ecoli-0-2-6-7vs3-5.zipMDLP_SMOTE_ecoli-0-2-6-7vs3-5.zipID3_SMOTE_ecoli-0-2-6-7vs3-5.zipUSD_SMOTE_ecoli-0-2-6-7vs3-5.zip
ecoli-0-3-4-6vs57 (7/0/0)2059.25CADD_SMOTE_ecoli-0-3-4-6vs5.zipCAIM_SMOTE_ecoli-0-3-4-6vs5.zipChiMerge_SMOTE_ecoli-0-3-4-6vs5.zipChi2Merge_SMOTE_ecoli-0-3-4-6vs5.zipMDLP_SMOTE_ecoli-0-3-4-6vs5.zipID3_SMOTE_ecoli-0-3-4-6vs5.zipUSD_SMOTE_ecoli-0-3-4-6vs5.zip
ecoli-0-3-4-7vs5-67 (7/0/0)2579.28CADD_SMOTE_ecoli-0-3-4-7vs5-6.zipCAIM_SMOTE_ecoli-0-3-4-7vs5-6.zipChiMerge_SMOTE_ecoli-0-3-4-7vs5-6.zipChi2Merge_SMOTE_ecoli-0-3-4-7vs5-6.zipMDLP_SMOTE_ecoli-0-3-4-7vs5-6.zipID3_SMOTE_ecoli-0-3-4-7vs5-6.zipUSD_SMOTE_ecoli-0-3-4-7vs5-6.zip
yeast-0-5-6-7-9vs48 (8/0/0)5289.35CADD_SMOTE_yeast-0-5-6-7-9vs4.zipCAIM_SMOTE_yeast-0-5-6-7-9vs4.zipChiMerge_SMOTE_yeast-0-5-6-7-9vs4.zipChi2Merge_SMOTE_yeast-0-5-6-7-9vs4.zipMDLP_SMOTE_yeast-0-5-6-7-9vs4.zipID3_SMOTE_yeast-0-5-6-7-9vs4.zipUSD_SMOTE_yeast-0-5-6-7-9vs4.zip
vowel013 (10/3/0)9889.98CADD_SMOTE_vowel0.zipCAIM_SMOTE_vowel0.zipChiMerge_SMOTE_vowel0.zipChi2Merge_SMOTE_vowel0.zipMDLP_SMOTE_vowel0.zipID3_SMOTE_vowel0.zipUSD_SMOTE_vowel0.zip
ecoli-0-1-4-7vs2-3-5-67 (7/0/0)33610.59CADD_SMOTE_ecoli-0-1-4-7vs2-3-5-6.zipCAIM_SMOTE_ecoli-0-1-4-7vs2-3-5-6.zipChiMerge_SMOTE_ecoli-0-1-4-7vs2-3-5-6.zipChi2Merge_SMOTE_ecoli-0-1-4-7vs2-3-5-6.zipMDLP_SMOTE_ecoli-0-1-4-7vs2-3-5-6.zipID3_SMOTE_ecoli-0-1-4-7vs2-3-5-6.zipUSD_SMOTE_ecoli-0-1-4-7vs2-3-5-6.zip
glass29 (9/0/0)21411.59CADD_SMOTE_glass2.zipCAIM_SMOTE_glass2.zipChiMerge_SMOTE_glass2.zipChi2Merge_SMOTE_glass2.zipMDLP_SMOTE_glass2.zipID3_SMOTE_glass2.zipUSD_SMOTE_glass2.zip
ecoli-0-1-4-7vs5-66 (6/0/0)33212.28CADD_SMOTE_ecoli-0-1-4-7vs5-6.zipCAIM_SMOTE_ecoli-0-1-4-7vs5-6.zipChiMerge_SMOTE_ecoli-0-1-4-7vs5-6.zipChi2Merge_SMOTE_ecoli-0-1-4-7vs5-6.zipMDLP_SMOTE_ecoli-0-1-4-7vs5-6.zipID3_SMOTE_ecoli-0-1-4-7vs5-6.zipUSD_SMOTE_ecoli-0-1-4-7vs5-6.zip
ecoli-0-1-4-6vs56 (6/0/0)28013CADD_SMOTE_ecoli-0-1-4-6vs5.zipCAIM_SMOTE_ecoli-0-1-4-6vs5.zipChiMerge_SMOTE_ecoli-0-1-4-6vs5.zipChi2Merge_SMOTE_ecoli-0-1-4-6vs5.zipMDLP_SMOTE_ecoli-0-1-4-6vs5.zipID3_SMOTE_ecoli-0-1-4-6vs5.zipUSD_SMOTE_ecoli-0-1-4-6vs5.zip
glass49 (9/0/0)21415.47CADD_SMOTE_glass4.zipCAIM_SMOTE_glass4.zipChiMerge_SMOTE_glass4.zipChi2Merge_SMOTE_glass4.zipMDLP_SMOTE_glass4.zipID3_SMOTE_glass4.zipUSD_SMOTE_glass4.zip
ecoli47 (7/0/0)33615.8CADD_SMOTE_ecoli4.zipCAIM_SMOTE_ecoli4.zipChiMerge_SMOTE_ecoli4.zipChi2Merge_SMOTE_ecoli4.zipMDLP_SMOTE_ecoli4.zipID3_SMOTE_ecoli4.zipUSD_SMOTE_ecoli4.zip
page-blocks-1-3vs410 (4/6/0)47215.86CADD_SMOTE_page-blocks-1-3vs4.zipCAIM_SMOTE_page-blocks-1-3vs4.zipChiMerge_SMOTE_page-blocks-1-3vs4.zipChi2Merge_SMOTE_page-blocks-1-3vs4.zipMDLP_SMOTE_page-blocks-1-3vs4.zipID3_SMOTE_page-blocks-1-3vs4.zipUSD_SMOTE_page-blocks-1-3vs4.zip
abalone9-188 (7/0/1)73116.4CADD_SMOTE_abalone9-18.zipCAIM_SMOTE_abalone9-18.zipChiMerge_SMOTE_abalone9-18.zipChi2Merge_SMOTE_abalone9-18.zipMDLP_SMOTE_abalone9-18.zipID3_SMOTE_abalone9-18.zipUSD_SMOTE_abalone9-18.zip
glass-0-1-6vs59 (9/0/0)18419.44CADD_SMOTE_glass-0-1-6vs5.zipCAIM_SMOTE_glass-0-1-6vs5.zipChiMerge_SMOTE_glass-0-1-6vs5.zipChi2Merge_SMOTE_glass-0-1-6vs5.zipMDLP_SMOTE_glass-0-1-6vs5.zipID3_SMOTE_glass-0-1-6vs5.zipUSD_SMOTE_glass-0-1-6vs5.zip
glass59 (9/0/0)21422.78CADD_SMOTE_glass5.zipCAIM_SMOTE_glass5.zipChiMerge_SMOTE_glass5.zipChi2Merge_SMOTE_glass5.zipMDLP_SMOTE_glass5.zipID3_SMOTE_glass5.zipUSD_SMOTE_glass5.zip
yeast-2vs88 (8/0/0)48223.1CADD_SMOTE_yeast-2vs8.zipCAIM_SMOTE_yeast-2vs8.zipChiMerge_SMOTE_yeast-2vs8.zipChi2Merge_SMOTE_yeast-2vs8.zipMDLP_SMOTE_yeast-2vs8.zipID3_SMOTE_yeast-2vs8.zipUSD_SMOTE_yeast-2vs8.zip
yeast48 (8/0/0)148428.1CADD_SMOTE_yeast4.zipCAIM_SMOTE_yeast4.zipChiMerge_SMOTE_yeast4.zipChi2Merge_SMOTE_yeast4.zipMDLP_SMOTE_yeast4.zipID3_SMOTE_yeast4.zipUSD_SMOTE_yeast4.zip
yeast58 (8/0/0)148432.73CADD_SMOTE_yeast5.zipCAIM_SMOTE_yeast5.zipChiMerge_SMOTE_yeast5.zipChi2Merge_SMOTE_yeast5.zipMDLP_SMOTE_yeast5.zipID3_SMOTE_yeast5.zipUSD_SMOTE_yeast5.zip
ecoli-0-1-3-7vs2-67 (7/0/0)28139.14CADD_SMOTE_ecoli-0-1-3-7vs2-6.zipCAIM_SMOTE_ecoli-0-1-3-7vs2-6.zipChiMerge_SMOTE_ecoli-0-1-3-7vs2-6.zipChi2Merge_SMOTE_ecoli-0-1-3-7vs2-6.zipMDLP_SMOTE_ecoli-0-1-3-7vs2-6.zipID3_SMOTE_ecoli-0-1-3-7vs2-6.zipUSD_SMOTE_ecoli-0-1-3-7vs2-6.zip
yeast68 (8/0/0)148441.4CADD_SMOTE_yeast6.zipCAIM_SMOTE_yeast6.zipChiMerge_SMOTE_yeast6.zipChi2Merge_SMOTE_yeast6.zipMDLP_SMOTE_yeast6.zipID3_SMOTE_yeast6.zipUSD_SMOTE_yeast6.zip

4. Data sets Indexes

Results of the indexes presented in the paper for each of the data sets. Each csv file contains the results obtained with the original imbalanced data set and the balanced data set using SMOTE. These results are presented for all the discretizers presented in the paper.

Results obtained in the process of discretization of the data sets used in the paper.The aim of indexes NP1, N4, N2, F4 and AP1 is to characterize the different data sets. Particularly, we compute the indexes before and after the discretization. process to measure the changes introduced by the selected discretizers for the different datasets. Each csv file presents the results with and without SMOTE, to study the influence of this recognized preprocessing step. The last column of the table presents a sumary of the mean values and standard deviation of all indexes with each discretizer for all data sets.

Table 3

F4 N2 N4 NP1 AP1 Indexes Means
F4.csv N2.csv N4.csv NP1.csv AP1.csv indexes_mean.csv

5. Algorithms results

In the following table we provide a csv file with the results obtained for each algorithm with all the discretizers presented in the paper. We also provide a tex file and pdf file with a set of post hoc procedures. There are two sets of results corresponding to the original imbalanced data sets and after applying SMOTE.

Table 4

Results Results with SMOTE
Algorithm Results with discretizers Post hoc procedure Results Results with discretizers Post hoc procedure Results
C45Rules C45Rules C45Rules C45Rules C45Rules C45Rules C45Rules
COGIN COGIN COGIN COGIN COGIN COGIN COGIN
EDGAR EDGAR EDGAR EDGAR EDGAR EDGAR EDGAR
GAssist GAssist GAssist GAssist GAssist GAssist GAssist
Oblique-DT Oblique-DT Oblique-DT Oblique-DT Oblique-DT Oblique-DT Oblique-DT
OCEC OCEC OCEC OCEC OCEC OCEC OCEC
REGAL REGAL REGAL REGAL REGAL REGAL REGAL
REGAL-TC REGAL-TC REGAL-TC REGAL-TC REGAL-TC REGAL-TC REGAL-TC
Ripper Ripper Ripper Ripper Ripper Ripper Ripper
SIA SIA SIA SIA SIA SIA SIA
UCS UCS UCS UCS UCS UCS UCS

6. Study of the Discretization Techniques and Non-Distributed Classifiers

Following, we present the outcome and the analysis of the rest of algorithms for the sake of comparison support by means of a graphical representation of accuracy versus number of rules. The objective is to check if the behavior of the discretizers is similar that in the case of distributed algorithms. Finally we presents some conclusions obtained in the ligth of the results obtained for the non-distributed classifiers presented.

Results

Results obtained with C45Rules

Figure 1 shows the results obtained for C4.5 with imbalance and balanced datasets. When no SMOTE is applied C4.5 outperforms all the discretizers obtaining the best accuracy and interpretability, however, applying SMOTE, results in a great improvement in the results obtained by ChiMerge and CAIM, which obtain the best global results coupled with C4.5 without discretization step.

Figure 1

C45Rules

Results obtained with COGIN

COGIN is an algorithm that does not deals with continuous values. Figure 2 shows the results obtained for all the discretizers. ChiMerge and CAIM are the discretizers whose obtain the best results using both, imbalance and balance data. In this case, there are differences between the use or not of SMOTE, using this method results an improvement in the results obtained in accuracy without an increase in the number of rules. The worst results are obtained by ID3 and CADD, obtaining the first, a high number of rules with the imbalance data.

Figure 2

COGIN

Results obtained with GAssist

As in the case of C4.5, Gassist obtains the best results outperforming all the discretizers when it deals with raw data, however, applying SMOTE, once again ChiMerge and CAIM are able to obtain results closer to GAssist, furthermore, MDLP and Chi2Merge also obtain good results. ID3 and CADD are the worse discretizers when GAssist treats with imbalance data. USD trends to obtain similar values to ID3 when the balance process is applied.

Figure 3

GAssist

Results obtained with Oblique-DT

Oblique-DT obtains the best results with imbalance data, only using ID3 reaches similar results, however using SMOTE are ChiMerge and CAIM whose obtain the best results, being the first which obtains the best performance with the lowest number of rules. Oblique-DT obtains similar results to USD, MDLP and ID3. Chi2Merge in this case, although obtains a good accuracy, also have a high number of rules. Regardless the balance or imbalance of the data is CADD the discretizer with the lower number of rules but at expense of obtains a poor accuracy.

Figure 4

Oblique-DT

Results obtained with OCEC

Figure 5 shows the results obtained for OCEC, unlike in other cases, OCEC does not deals with continuous values, so this results can help to choose a good discretizer to use with OCEC. With the imbalance data ChiMerge, CAIM and MDLP are the discretizers whose obtain the best results. ID3 is which obtains the worst results in this case. Applying SMOTE, ChiMerge and CAIM obtain the best results, MDLP and Chi2Merge are able to obtain good accuracy but with a greater number of rules. As in previous results, CADD worse using SMOTE, being the discretizer with the lowest performance. It can be seen that, for OCEC, the results obtained using ChiMerge and CAIM are very similar regardless the use or not of SMOTE.

Figure 5

OCEC

Results obtained with Ripper

RIPPER obtains the best results regardless the use of SMOTE, in this case, unlike previous cases, when we apply SMOTE the results of the discretizers get away the results of RIPPER, which greatly increases its performance. ChiMerge and CAIM are closer treating with raw data, but with the balance of the data is Chi2Merge the discretizer which most gets closer to the results of RIPPER. In this case, we can see that CADD worse its results applying SMOTE.

Figure 6

Ripper

Results obtained with SIA

In the case of SIA, using the imbalance data, the original method is outperformed when a discretizer is used, in this case, Chi2Merge and ChiMerge obtain the best results. We want to highlight that, in this case, SIA using ID3 and USD obtains a much smaller number of rules that SIA with its built-in discretizer. Applying SMOTE, although the results of SIA are the most accurate, obtains a high number of rules, however, with Chi2Merge the accuracy is closer but with a low number of rules. We also can see that MDLP is the discretizer which more improves using SMOTE, outperforming all the discretizers except Chi2Merge.

Figure 7

SIA

Results obtained with UCS

Figure 8 shows the results obtained for UCS, in this case, with the imbalance datasets, the results obtained by UCS are outperformed when UCS uses a discretizer, and the best results are obtained by ID3. CAIM and ChiMerge also obtain good results. When the data are balanced using SMOTE, all the discretizers except CADD improve their results, it can be highlight that, the balance of the data makes UCS to obtain good results closely to USD discretizer, that is which obtains the best results in this case.

Figure 8

UCS

Discussion and concluding remarks about the study

We divide the results obtained into three groups depending on the type of the algorithms, non-evolutionary algorithms, evolutionary algorithms and GCCL algorithms in order to compare the behavior of the discretizers with the results obtained with the distributed algorithms.

As global conclusions of this complementary study, it can be outstanding the following respects

7. Discretizer results

For the sake of cross reference, we provide the results obtained for each discretizer with all the distributed and non-distributed algorithms. Each csv file contains the results obtained with the original imbalanced data set and the balanced data set using SMOTE.We also provide a tex file and pdf file with a set of post hoc procedures.

Table 5

Results Results with SMOTE
Discretizer Results with algorithms Post hoc procedure Results Results with algorithms Post hoc procedure Results
CADD CADD CADD CADD CADD CADD CADD
CAIM CAIM CAIM CAIM CAIM CAIM CAIM
ChiMerge ChiMerge ChiMerge ChiMerge ChiMerge ChiMerge ChiMerge
Chi2Merge Chi2Merge Chi2Merge Chi2Merge Chi2Merge Chi2Merge Chi2Merge
MDLP MDLP MDLP MDLP MDLP MDLP MDLP
ID3 ID3 ID3 ID3 ID3 ID3 ID3
USD USD USD USD USD USD USD

Sistemas Inteligentes y Minería de Datos

Escuela Técnica Superior de Ingeniería

email: simd_at_dti.uhu.es

Universidad de Huelva

Sugerencias | Aviso Legal | Acerca de