Data Preparation
Examine all data attributes and identify issues present in the data. For each of
the issues that you have identified, choose and perform necessary actions to address it. Note that you
will need to apply these actions to both the training and test data at the same time. At the end of this
phase, you will have two data sets: one for training and one for the final testing task. Your marks for this
task will depend on how well you identify the issues and address them. Below is a list of data
preparation issues that you need to address
• Identify and remove irrelevant attributes.
• Detect and handle missing entries.
• Detect and handle duplicates (both instances and attributes).
• Select suitable data types for attributes.
• Perform data transformation (such as scaling/standardization) if needed.
• Perform other data preparation operations (This is optional, bonus marks will be awarded for
novel ideas).
For each of the above issues your report should:
• Describe the relevant issue in your own words and explain why it is important to address it. Your
explanation must consider the classification task that you will undertake subsequently.
• Demonstrate clearly that such an issue exists in the data with suitable illustration/evidence.
• Clearly state and explain your choice of action to address such an issue.
• Demonstrate convincingly that your action has addressed the issue satisfactorily