BIA-652 Final Project Description (2023S)
In this project, you will have the opportunity to apply the skills you’ve learned throughout the semester to analyze real-world data from a supermarket, focusing on checkout operations and customer behavior. Please read the pdf file that describes the dataset very carefully. The project will be based on two datasets: transaction data and cashier operations data, each split into multiple periods. You may need to merge these datasets, aggregate data, and create new variables as needed. The supermarket manager has provided us with a list of research questions to guide your analysis:
- What is the average transaction time for each checkout type: service (Work station GroupID = 1) vs. self-service (Work station GroupID = 8)?
- How does the payment method (cash vs. card) impact transaction time?
- How does the average transaction time change with the basket size? Is there a non-linear relationship between these two variables?
- What are the peak hours and days for transactions at the supermarket? Are there any patterns or trends?
- Develop a regression model to predict transaction time with at least the following variables: basket size (Art Num), payment method, and checkout type. Use the model to answer the following question. How do break times and their durations affect the transaction time of the following transactions?
- Create a new variable representing the time of day (morning, afternoon, evening, and night) based on the Begin Date Time. How do payment methods (cash vs. card) vary across different times of the day?
- Build a logistic regression model to predict the probability of a customer choosing self-service based on factors such as time of day, day of the week, basket size, and transaction value. Which factors are the most significant predictors of choosing self-service over cashier service? Do consumers prefer using self-service checkouts during peak hours compared to regular hours?