Collections of observations, such as measurements, or survey responses
The science of planning studies and experiments; obtaining data; and organizing, summarizing, presenting, analyzing and interpreting those data and then drawing conclusions based on them
The complete collection of all measurements or data that are being considered
The collection of data from every member of a population
A subcollection of members from a population
Common types of voluntary response samples
Internet polls, main-in polls, telephone call-in polls
When a study has results that are very unlikely to occur by chance
Association between two variables
Occurs when someone either refuses to respond to a survey question or is unavailable
How to find percentage of
Replace % symbol with division by 100, then interpret “of” to be multiplication
A numerical measurement describing some characteristic of a population
A numerical measurement describing some characteristic of a sample
Quantitative(numerical) data
Consists of numbers representing counts or measurements
Categorical(qualitative/attribute) data
Consists of names or labels
Result when the data values are quantitative and the number of values is finite or “countable”
Continuous(numerical) data
Result from infinitely many possible quantitative values, where the collection of values is not countable
Characterized by data that consists of names, labels, or categories only. It is not possible to arrange the data in some order, such as low to high
Data that can be arranged in some order, but differences between data values either cannot be determined or are meaningless
Data that can be arranged in order, and differences between data values can be found and are meaningful; but data at this level do not have a natural zero starting point at which none of the quantity is present
Data that can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point
Refers to data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools
Involves applications of statistics, computer science, and software engineering, along with some other relevant fields, such as biology and epidemiology
Missing completely at random
If the likelihood of a data value being missing is independent of its value or any of the other data values in the data set
If the missing value is related to the reason that it is missing
Where we apply some treatment and proceed to observe its effect on the individuals
Observe and measure specific characteristics, but we don’t attempt to modify the individuals being studied
One that affects the variables included in the study, but is not included in the study
Design of an experiment includes__
Replication, blinding, randomization
n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen
Select some starting point and then select every kth element in the population
Simply use data that are very easy to get
Subdivide the population into at least 2 different subgroups so that subjects within the same subgroup share the same characteristics. Then we draw a sample from each subgroup
Divide the population area into sections, randomly select some of those sections, and choose all the members from those selected sections
Pollsters select a sample in different stages, and each stage might use different methods of sampling
Types of observational studies
Cross-sectional study, retrospective study, prospective study
Data are observed, measured, and collected at one point in time, not over a period of time
Data are collected from a past period by going back in time through records, interviews, etc
Data are collected in the future from groups that share common factors
Occurs when we can see some effect, but we can’t identify the specific factor that caused it
A group of subjects that are similar, but differ in ways that might affect the outcome of the experiment
Occurs when the sample has been selected with a random method but there is a discrepancy between a sample result and a true population result
The result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, forming biased conclusions, or applying statistical methods that are not appropriate for the circumstances
The result of using a sampling method that is not random, such as using a convenience sample or voluntary response sample
A representative value that shows us where the middle of the data set is located
A measure of the amount that the data values vary
The nature or shape of the spread of the data over the range of values
Sample values that lie very far away from the vast majority of the other sample values
Frequency distribution(frequency table)
Shows how data are partitioned among several categories(classes) by listing the categories along with the number(frequency) of data values in each of them
The smallest numbers that can belong to each of the different classes
The largest numbers that can belong to each of the different classes
The numbers used to separate the classes, but without the gaps created by class limits
The values in the middle of the classes
The difference between two consecutive lower class limits in a frequency distribution
((max data value-min data value))/# of classes
A graph consisting of bars of equal width drawn adjacent to each other
When the distribution of data is not symmetric and extends more to one side than to the other
If the pattern of the points in the normal quantile plot is reasonably close to a straight line, and the points do not show some systematic pattern that is not a straight-line pattern
Not a normal distribution
The points do not lie reasonably close to a straight-line pattern or the points show some systematic pattern that is not a straight line pattern
Consists of a graph of quantitative data in which each data value is plotted as a point above a horizontal scale of values
Represents quantitative data by separating each value into 2 parts:the stem(such as the leftmost digit) and the leaf(such as the rightmost digit)
A graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly
Uses bars of equal width to show frequencies of categories of qualitative data
A bar graph for qualitative data, with the added stipulation that the bars are arranged in descending order according to frequencies, so the bars decrease in height from left to right
Depicts qualitative data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category