Provide a brief description and examples of each of the following methods of clustering:
- Partitioning methods.
- Hierarchical methods.
- Density-based methods.
- Grid-based methods.
Load the soybean diagnosis data set in Weka (found in Weka-3.6/data/soybean.arff), then perform the following:
- Build a decision tree by selecting J48 as the classifier and 10-way cross-validation. Then fill out the following table:
Correctly Classified Instances | |
Incorrectly Classified Instances | |
Kappa statistic | |
Mean absolute error | |
Root mean squared error | |
Relative absolute error | |
Root relative squared error | |
Total Number of Instances |
- Build a Naïve Bayes classifier and select 10-way cross-validation. Then fill out the following table:
Correctly Classified Instances | |
Incorrectly Classified Instances | |
Kappa statistic | |
Mean absolute error | |
Root mean squared error | |
Relative absolute error | |
Root relative squared error | |
Total Number of Instances |
- Compare between results in previous two sections (a and b), which algorithm give the better result and why?
Construction and evaluation of a classifier’s accuracy on a dataset require partitioning labeled data into a training set and a test set. Explain three main methods used for such partitioning.
Explain why cross-validation is used in both supervised learning (classification) and unsupervised learning (clustering)?