This is your chance to design and/or evaluate a ‘predictive model’ of your own/choice for a real-world application. Application and data can be of your choice but also a wide range of recommended datasets for machine learning problems are available in UCI Machine Learning Repository 1 (Most Popular Data Sets – hits since 2007), and challenges, datasets and analytics contributions Kaggle 2, or check course’s Blackboard page for further datasets. For this coursework your design/choice, and your approach to evaluate a machine learning solution (or a predictive model) is key – you can (but do not need to) implement a model, write code or collect data yourself. You should identify a real problem, need, frame a solution and come up with analytical analysis to evaluate your choice of a learning algorithm for your predictive model.
Your report (in a form of a discussion paper) should cover the following elements:
- Discuss a machine learning problem given your chosen application; identify the problem, the requirements for a predictive model and its impact.
- Describe and analysis a dataset and its characteristics; size, representation and attributes,
- Discuss whether bivariate or multivariate analysis is most suitable for your predictive model.
- Choose/apply (a) learning algorithm(s) and identify its/their categories; supervised, unsupervised, semi-supervised.
- Analytically or experimentally evaluate your choice of machine learning solution; its suitability, cost, and apply an error evaluation metric to justify your choice, e.g., classification accuracy of classification problems, MSE and/or R^2 (R squared) for regression models, etc.
- Choose a learning algorithm which you think is less suitable for your predictive model and justify your “rejection” reasons.
Datasets can be found here: https://www.kaggle.com/datasets