Statistics
Problems not marked with [SPSS] should be done by hand, as problems similar to those could be on the Midterm and/or Final. (Of course, feel free to check your work using SPSS!)
Problems marked with [SPSS] are intended to be done with SPSS. For these problems, please attach the output file.
- A study was conducted investigating the long-term prognosis of children who have suffered an acute episode of bacterial meningitis, an inflammation of the membranes enclosing the brain and spinal cord. Listed below are the times to the onset of seizure for 13 children who took part in the study. In months, the measurements are:
0.10 0.25 0.50 4 12 12 24 24 31 36 42 55 96
Find the following numerical summary measures of the data:
- Mean
- Median
- Mode
- Range
- Interquartile Range
- Standard Deviation
- How many standard deviations away from the mean is a child whose time to the onset of seizure was 50 months? (Note: for the purpose of this problem, please assume that the population standard deviation is the same as the sample standard deviation.)
- What proportion of children have an onset to seizure time of 50 or more months?
- What proportion of children have an onset to seizure time between the mean and 50 months?
- Calculate a 95% confidence interval around the mean assuming that the data are normally distributed with a known population variance of 20
- Calculate a 95% confidence interval around the mean assuming that the data are normally distributed with an unknown population variance.
- Calculate a 99% confidence interval around the mean assuming that the data are normally distributed with an unknown population variance.
- [SPSS] A study was conducted comparing female adolescents who suffer from bulimia to healthy females with similar body compositions and levels of physical activity. The file sav contains measures of daily caloric intake, recorded in kilocalories per kilogram, for samples of adolescents from each group.
- Find the median daily caloric intake for both the bulimic adolescents and the healthy ones.
- Compute the IQR for each group.
- Construct box-and-whisker plots for each group.
- Describe the shape of the observed distribution for each group. Do you think that the sampled data come from a population with a normal distribution? Why or why not?
- Describe the qualitative differences between the two groups based on the box-and-whisker plots. (For example, which average is higher? Which group has more variability? Are there outlying values in either group?)
- [SPSS] The declared concentrations of nicotine in milligrams for 35 brands of Canadian cigarettes are saved under the variable name nicotine in the file sav.
- Find the mean and median concentrations of nicotine.
- Produce a histogram of the nicotine measurements. Describe the shape of the observed distribution. Do you think that the sampled data come from a population with a normal distribution? Why or why not?
- Which number do you think provides the best measure of central tendency for these concentrations, the mean or the median? Why?
- [SPSS] The data set sav contains information for the sample of 100 low birth weight infants born in Boston, Massachusetts. This data set contains information on the infants, including systolic blood pressure (SBP), gender, and gestational age of the infant, as well as APGAR score at 5 minutes, toxemia diagnosis for mother and germinal matrix hemorrhage.
- Run descriptive statistics in SPSS on all numeric variables, including all possible dispersion statistics, as well as skewness and kurtosis. Attach the output file.
- Use SPSS to provide a 95% confidence interval around the mean and show the quartiles (25th, 50th, 75th percentiles) for each numeric variable. Attach the output file (can be all one file).
- Create frequency tables in SPSS of all categorical tables. Attach the output file (can be all one file).
- Create a cross-tabulation table of toxemia diagnosis for mother and germinal matrix hemorrhage, including the expected frequencies and column percentages. Attach the output file (can be all one file).
- [SPSS] The datasetsav contains data examining the mean pulse rate of students taking a midterm for PM 510. Two TAs each measured the pulse rate of 10 students taking the midterm in the class after 1 hour. Each TA selects 10 students at random. Let 𝜇 represent the true (population) mean pulse of the students taking the PM 510 midterm.
- Calculate the 90% confidence interval for 𝜇 based on the data collected by the 1st
- Calculate the 90% confidence interval for 𝜇 based on the data collected by the 2nd
- Interpret the confidence intervals.
- Compare the two confidence intervals. Give some possible reasons why they are different.
- A library wants to determine the effectiveness of their summer literacy program among low-income children. Because surveying the large numbers of students in the program would require too many resources the library staff interviews 30 randomly chosen children among the low-income program attendees. The 30 sampled children are given a reading test before and after the program.
- Describe the population of this study.
- The difference in the reading test scores (after – before) has mean = 10 and SD = 4. Assuming the score differences are normally distributed, what percent of the children showed any improvement (difference > 0) in reading ability?
- What percent of children improved by more than 15 points?
- [SPSS] Use SPSS (use a blank dataset) to calculate the following probabilities: Consider the standard normal distribution with mean μ = 0 and standard deviation 𝜎 = 1. Provide the answers to each question and attach the output file.
- What is the probability that an outcome z is < -2.05?
- What is the probability that an outcome z is > 1.82?
- What is the probability that an outcome z is > -1.82?
- What is the probability that an outcome z is between –2.28 and 1.92?
- What value of z cuts off the upper 30% of the standard normal distribution?
- What value of z cuts off the lower 8% of the standard normal distribution?