Descriptive statistics
- Descriptive statistics - Exercises
- Study Guide for Unit 1
- Supplementary Video Resources
- descriptive statistics: the set of methods used to organize, display, and describe data so that coherent, relevant information about the problem, objective, or decision emerges.
- Lectures note houses ideas and examples, will it be too long?
- [[215eTextErrata_2022.pdf]]
Statistics and Basic terms
Descriptive statistics, p.1, collection of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures, as many data sets originally are large
- to construct tables and graph data
- to calculate numerical summary measures, e.g. averages
Inferential statistics - collection of methods helping make decisions about a population based on sample results; units 4-6
- Estimation and Tests of Hypotheses for One Population
- Tests of Hypotheses for Two or More Populations
- Bivariate Analysis
Probability gives a measurement of the likelihood that a certain outcome will occur, as a link between Descriptive statistics and Inferential statistics, used to make statements about the occurrence and nonoccurrence of an event under uncertain conditions
Element (a member; specific subject or object included in a sample or population, e.g. a patient, doctor, hospital, a disease, etc.)
Variable (a characteristic under study that assumes different values for different elements, e.g. total wealth, household incomes; often denoted by x, y, z)
Observation (the measurement, the value of a variable for an element, e.g. the total wealth of Warren Buffett was $72.7 billion)
Data set (data, collection of observations or measurements on one or more variables;
- e.g. the list of total wealth of the eight richest persons)
Population (a collection of all elements whose characteristics being studied of interest)
Statistics
-
- numerical facts, e.g. family income.
-
- the science of (field or discipline of study) of collecting, analyzing, presenting, and interpreting data, and of making decisions based on such analyses; have 2 aspects of theoretical and applied
Theoretical statistics (or mathematical statistics, deals with the development, derivation, and proof of statistical theorems, formulas, rules, and laws)
Applied statistics (focused by the textbook involves the applications of theorems, formulas, rules, and laws to solve real-world problems; how to think statistically and make educated guesses (more reliable decisions made by using statistical methods)
- have 2 types - Descriptive statistics and Inferential statistics.
Types of variables and the nature of statistical data
Types of Variables, p. 6
- Quantitative variable - a variable that can be measured numerically
- e.g. income, height, gross sales.
- the data collected are called quantitative data
- Discrete variable (a variable that assumes countable values), e.g.
- number of cars owned by a family
- Continuous variable - a variable that can assume any numerical value over a certain interval or intervals, e.g.
- height of a family member cannot be counted as it's measured on a continuous scale;
- the time taken to complete an examination;
- any variable involving money and can assume a large number of values)
- Qualitative variable - a variable that cannot assume a numerical value but can be classified into 2 or more nonnumeric categories; the data collected are called qualitative data; e.g.
- the status of an undergraduate college student
Classified based on the time over which the data are collected
- Cross-section data - data collected on different elements at the same point in time or for the same period of time; e.g.
- the total wealth of the world's 8 richest persons in 2015
- Time-series data - data collected on the same element for the same variable at different points in time or for different periods of time), e.g.
- average tuition in dollars at 4-year public institutions in 5 periods
Population, sampling, design of experiments, and summation notation
Population and sample, p. 10
- Population, or target population, consists of all elements (individuals, items, or objects) whose characteristics are being studied.
- Sample is a portion of population selected for study.
Census and sample survey
- The collection of information from the elements of a sample is a sample survey, while when collecting information on all elements of the target population, it's a census
Sampling with replacement - a selected element would be put back before the next element is selected, so that the population remains the same number of items upon each selection;
- e.g. rolling a die many times.
Sampling without replacement - occurs when the selected element is not replaced in the population, so that an item wouldn't be selected twice.
- @@ for inferences derived to be more reliable for decision-making on the corresponding population
Random samples (drawn in a way that each member of the population has some chance of being selected in the sample; usually representative sample)
Nonrandom samples (some members of the population may not have any chance of being selected)
- Convenience sample (where the most accessible members of the population are selected to get results quickly, e.g. an opinion poll conducted from certain shoppers at a single mall)
- Judgment sample (where the members are selected based on the judgement and prior knowledge of an expert; the chances of being representative are low)
Sampling errors
Nonsampling errors (errors occurring in the collection, recording, and tabulation of data, instead of the sampling process)
Identify Random sampling techniques (p. 14)
- Simple random sampling (where each sample of the same size has the same probability of being selected; e.g. select by a lottery or drawing)
- Systematic random sampling (1 member from the first k units of the list of elements arranged based on a given trait, where k is the number obtained by dividing the population size by the intended sample size, and then every kth member starting from the 1st selected member is included in the sample)
- Stratified random sampling (where population firstly divided into subpopulation named strata, and then one sample is selected from each stratum)
- Cluster sampling (where the whole population first divided into groups like geographically; each cluster representing of the population; then a random sample of clusters selected; finally to select a random sample of elements from the selected clusters)
Define treatment, randomization, designed experiment
- Randomization is the procedure in which elements are assigned to different groups at random. In this case, the groups will not differ much with regard to most factors. the other factors that affect the weights of people have been controlled. Thus, this is a designed experiment.
- Design of experiments
- To use statistical methods to make decisions, people need to obtain data from observational studies, controlled experiments, or surveys.
- The experimenter would impose a condition or a set of conditions, known as treatment, on a group of elements.
- if not imposed on elements like in an observational study researchers simply collects information from the persons, the effects of one factor cannot be separated from the other, and will be confounded.
- designed experiment vs. observational study
- When the experimenter controls the random assignment of elements to different treatment groups, the study is said to be a Designed experiment. In contrast, an Observational study is where the assignment of elements to different treatments is voluntary and the experimenter simply observes the result of the study.
- treatment vs. control group
- the group of elements receiving a treatment is the Treatment group, and the group of elements not receiving a treatment is the Control group
- compute the values for expressions presented in summation notation
- Summation notation -
, used to denote the sum of value, pronounced "sigma x"
- Summation notation -
Organizing and graphing qualitative data
- p. 36
- Construct a frequency distribution including frequencies, relative frequencies, and percentage frequencies, given raw data for a qualitative (categorical) variable
- Raw data
- Frequency distributions of a qualitative variable
- Relative frequency and percentage distributions
- Construct a bar graph and a pie chart
- Graphical presentation of qualitative data
- bar graphs
- pareto chart
- pie chart
- Graphical presentation of qualitative data
- Interpret frequencies, relative frequencies, and percentage frequencies, given a frequency distribution or a graph relating to a frequency distribution
Organizing and graphing quantitative data
- p. 43
- Construct a frequency distribution table using either a "less than" or "not less than" method for writing the classes, given raw data for a continuous variable
- this type of distribution can include class limits, class boundaries, midpoints, raw data frequencies, relative frequencies, percentage frequencies, cumulative frequencies, cumulative relative frequencies, and cumulative percentage frequencies
- Frequency distributions for quantitative data
- Finding class width
- Calculating class midpoint or mark
- Construct the following graphs: histogram, relative frequency histogram, frequency polygon, relative or percentage frequency polygon, ogive, the relative or percentage ogive
- Graphing grouped data
- histograms
- polygons
- Graphing grouped data
- construct a frequency distribution table and then a bar graph, using single-valued classes, given raw data.
- this type of distribution can include frequencies, relative frequencies, and percentage frequencies.
- Constructing frequency distribution tables
- interpret frequencies, relative and percentage frequencies, cumulative frequencies, cumulative relative frequencies, and cumulative percentage frequencies, given a frequency distribution or a related graph.
- interpret symmetric, skewed and uniform distributions for the frequency distribution or graph
- Relative frequency and percentage distributions
- Calculating relative frequency and percentage
- Cumulative frequency distribution
- Calculating cumulative relative frequency and cumulative percentage
- construct stem-and-leaf displays and dotplots, and identify possible outliers, given raw data.
- Stem-and-Leaf displays
- Dotplots
- 2 Alternative methods to construct a frequency distribution
- Less-than method for writing classes
- Single-valued classes
- Shapes of histogram
- truncating axes
- Added - Class boundaries and ogive
- Class Boundaries
- ogive
Measures of central tendency for ungrouped data
- p. 77
- Compute the mean, median, and mode, given ungrouped (raw) sample data or ungrouped population data
- compute the weighted mean for a data set
- Weighted mean
- identify the advantages and disadvantages of using the mean, weighted wean, median, and mode as a measure of central tendency for different types of data sets
- determine how the skewness of a data set affects the relationship b/w the mean, median, and mode
- Mean
- Median
- Mode
Trimmed mean- Relationships among the mean, median, and mode
- skewness
Measures of dispersion for ungrouped data
- p. 89
- compute the range, variance, standard deviation, and coefficient of variation, given ungrouped (raw) sample data or ungrouped population data
- Range
- Variance and standard deviation
- identify the advantages and disadvantages of using the range, standard deviation, and coefficient of variation as a measure of dispersion for different types of data sets.
- Coefficient of variation
- distinguish b/w a parameter and a statistic
- Population parameters and sample statistics
Mean, variance, standard deviation for grouped data
- p. 97
- compute the mean, variance, and standard deviation, given grouped sample or grouped population data
- Mean for grouped data
- Variance and standard deviation for grouped data
Using standard deviation
- p. 103
- use Chebyshev's theorem with any distribution to find the proportion or percentage of the total observations falling within a given interval about the mean
- Chebyshev's theorem
- use the empirical rule with any bell-shaped distribution to find the proportion or percentage of the total observations falling within a given interval about the mean
- Empirical rule
Measures of position and Box-and-Whisker plots
- p. 107
- compute the 3 quartiles (Q 1, Q 2, Q 3), the interquartile range, percentiles, and percentile ranks, given ungrouped (raw) sample data or ungrouped population data.
- Quartiles and interquartile range
- Percentiles and percentile rank
- interpret the 3 quartiles (Q 1, Q 2, Q 3), the interquartile range, percentiles, and percentile ranks in the context of a given problem.
- construct a box-and-whisker plot, given ungrouped (raw) sample data or ungrouped population data.
- Box-and-Whisker plot
- determine the 3 quartiles, the lower and upper inner fences, the skewness, and the outliers (if any), given a box-and-whisker plot.
Appendix 3.1