Download the data set Health Care Cost per Employee.csv . In this dataset you will
find data on small to mid sized local business and they’re health care costs (in
thousands, and actually it’s a bunch of other benefit costs as well, but just pretend its
health care costs and ignore why the numbers seem so high). There are two
variables, the first is the number of employees that a company has. This number
ranges from a single employee up to about 100 employees. The second variable
represents the average cost in benefits associated with employees.
You’ll notice if you scatter plot, benefits for a small number of employees is quite
high (image paying single payer health insurance for a few people and their families
in addition to insuring them at work Ect. Ect.).
You are tasked with the following:
1. Develop a model for estimating the average or expected average cost of
benefits based on the number of employees a company has. If you develop a
parametric model, please provide the model. If you develop a non parametric model, please graphically represent your model overlaid on top of
a scatterplot of the data. In either case please document how you arrived at
your final model.
2. Create a 95% confidence interval for E(avg. cost|55 employees). That is,
compute a 95% confidence interval for the average cost of benefits per
employee for all companies that have 55 employees.
3. Create a 95% prediction interval for E(avg.cost|55 employees).
4. Add your results from part 2 and 3 to your scatterplot