Monday, March 7, 2016

notes/key ideas on box plots and outliers / notes/key ideas on z-score and z-table

Box Plots 
Box plots (also called box and whiskers) are used as a way to represent data.
Here is a representation of a box plot.

To determine outliers
1. 5 number summary
2. Determine IQR
3. Multiply 1.5 IQR
4. Set up fences Q1 / (1.5 IQR) and Q3 + 1.5 IQR)
5. Observations "outside" the fences are outliers.

Describing Location in a Distribution
-Standardized Value
*One way to describe relative position in a data set is to tell how many standardized deviations above or below the mean and the observations. 

Standardized Value: "z-score" 
If the mean and standard deviation of a distribution are known, the "z- score" of a particular observation, x, is:
      
 
z= z-score
x=value
mue = mean
sigma = standard deviation




Wednesday, February 3, 2016

Unit 1: Interpreting Categorical and Quantitative Data

Mean, Median, IQR
Mean- the average of all the numbers in your data set
*Take the sum of the data set and divide it by a number of numbers into your data. 
Median- the middle value or midpoint of a data set.
Interquartile Range- a measure of variability, based on dividing a data set into quartiles.

Standard Deviation 
Variance- measures the data distributes itself around
Standard- a measure that used to quantify the amount of variation of a data set.
*take the square root of the variance to find the deviation.
Formula used: E(x-m)^2
                             n
E= sigma (sum)
x= each number in data set
m= mu (mean of the data set)
n= amount of numbers in set

Bar Charts vs. Histograms
Bar Charts- columns are position over a label that represents a categorical variable.
-height determines the size of the group
Histograms-columns position over a label that represents a quantitative variable
-height indicates the size of the group

Data Shape Vocab
center- the point where data have the date is on both sides.
spread- refers to the variability of the data
shape- of the distribution is described by.
symmetry- a symmetric distribution can be divided at the center so each half is a mirror image of the other.
Number of peaks
unimodal- one clear peak (when in the center- called bell-shaped)
bimodal-two clear peaks
skewness- when one side of the distribution has more observations than the other
skewed right- fewer observations on the left
skewed left- fewer observations on the right.
uniform- when observations are equally spread across the range of the distribution
gaps - areas where there are no observations
outliers- extreme value