Coding Exercise 2: K–Means Clustering
(Adopted from Dr. Rogel–Salazar)
In Monday’s class, we learned about K–means clustering. You are asked to work on the following coding exercise in Python to model K–means clustering. To ensure successful completion of the exercise, please read the instructions carefully and follow the code step–by–step. After completing the exercise, please save your code, visual graphs, and results in the same format as the first exercise. Finally, submit the document in PDF format.
Before we start, here is an easy–to–follow recipe for all k–means models:
1. Decide how many clusters you want, i.e. choose your “k”
2. Randomly assign a centroid to each of the k clusters
3. Calculate the distance of all observation to each of the k centroids
4. Assign observations to the closest centroid
5. Find the new location of the centroid by taking the mean of all the observations in each cluster
6. Repeat steps 3–5 until the centroids do not change position