-
Types of data
-
Quantitative variables – how much
-
Continuous variables – business profits, sales, etc
-
Discrete – counting things
-
Categorical variables – what type
-
Unordered – also called nominal
-
Ordered – also called ordinal
-
Grades – A, B, C, D, and F
-
Class level – 1 st, 2 nd, 3 rd, and 4 th
-
Responses on a survey
-
Possible to convert one variable into another
-
Stem and Leaf Plots
-
Data should be plotted to get an idea what it looks like
-
This method is old
-
Example: Company’s assets in $ billions
-
Data – 3.5, 6.9, 4.4, 4.4, 2.2, 5.3, 4.3, 4.0, 5.1, 7.1, 0.6, 5.3, 6.7
-
Scan data and find the smallest and largest numbers
-
The data is unordered
0 |
6 |
1 |
|
2 |
2 |
3 |
5 |
4 |
4 4 3 0 |
5 |
3 1 3 |
6 |
9 7 |
7 |
1 |
-
The data is ordered
0 |
6 Possibly an outlier |
1 |
|
2 |
2 |
3 |
5 |
4 |
0 3 4 4 |
5 |
1 3 3 |
6 |
7 9 |
7 |
1 |
-
Outlier – an extreme value
-
Benefit? – The only plot where we still have the original data
-
Median – a mid point of a data set
-
Take data and order it from smallest to largest
-
Example
-
Unordered: 4.5 6.3 6.1 5.5 7
-
Ordered: 4.5 5.5 6.1 6.3 7
-
The median is the value in the center, which is 6.1 in our case
-
The median is not sensitive to outliers
-
If the data has an even number of points, then take the average of the two points in the center
-
Example
-
Unordered: 3 10 8 7
-
Ordered: 3 7 8 10
-
The median is the average of 7 and 8, which is 7.5
-
The average is (7 + 8)/2 = 7.5
-
Measures of variance
-
Range – the difference between the largest value in the sample (the maximum) and the smallest value (the minimum),
-
-
Very sensitive to outliers
-
Example
-
Unordered: 5 4 6 7 100
-
Ordered: 4 5 6 7 100
-
The range is [4, 100]
-
Did you notice the 100? It appears to be an outlier, because it is very large relative to the other numbers
-
Quartiles – divide the data into four groups
0 to 25% |
Bottom 25% of values |
25 to 50% |
|
|
Median is 50% |
50 to 75% |
|
75 to 100% |
Top 25% of values |
-
-
Usually works well for large data sets
-
Box-Whisker Plots – a nice way to plot quartiles
-
Excel cannot do this!
-
We can have several Box-Whisker Plots side by side
-
-
Some statistical programs can calculate these
-
Histograms – for continuous variables
-
Excel can do this with some difficulty
-
Steps
-
Take the data and categorize into groups; groups are ranked
-
Count how many are in a group, which is the frequency
-
A histogram displays the distribution of data
-
Excel
-
Find the maximum data point by using =max( ) function
-
Find the minimum data point by using = min( ) function
-
Specify the number of categories, k, which are also called bins
First category: min. to min. + (width)(1)
Second category: min. + (width)(1) to min. + (width)(2)
Last category: min. + (width)(k – 1) to min. + (width)(k)
-
Then use =countif( ) function to count how many data points fall with a category
-
This part is hard
-
Excel has a histrogram function in Data Analysis
-
If you choose too many categories, then you get noise
-
Bar charts – categorical data
-
Example – Medeo collects information on visitors for 2008
-
Almaty has 10,031 visitors
-
Astana has 542
-
Foreigners who visited are 5,321
-
Could convert frequency into a percentage
|