Explain why cross-validation is used in both supervised learning (classification) and unsupervised learning (clustering)?

Provide a brief description and examples of each of the following methods of clustering:

  • Partitioning methods.
  • Hierarchical methods.
  • Density-based methods.
  • Grid-based methods.

 Load the soybean diagnosis data set in Weka (found in Weka-3.6/data/soybean.arff), then perform the following:

  • Build a decision tree by selecting J48 as the classifier and 10-way cross-validation. Then fill out the following table:
Correctly Classified Instances  
Incorrectly Classified Instances  
Kappa statistic  
Mean absolute error  
Root mean squared error  
Relative absolute error  
Root relative squared error  
Total Number of Instances  
  • Build a Naïve Bayes classifier and select 10-way cross-validation. Then fill out the following table:
Correctly Classified Instances  
Incorrectly Classified Instances  
Kappa statistic  
Mean absolute error  
Root mean squared error  
Relative absolute error  
Root relative squared error  
Total Number of Instances  
  • Compare between results in previous two sections (a and b), which algorithm give the better result and why?

Construction and evaluation of a classifier’s accuracy on a dataset require partitioning labeled data into a training set and a test set. Explain three main methods used for such partitioning.

Explain why cross-validation is used in both supervised learning (classification) and unsupervised learning (clustering)?