Recall the car data set you identified in Week 2. We know that this data set is normally distributed using the mean and SD you calculated. (Be sure you use the numbers without the supercar outlier)

For the next 4 cars that are sampled, what is the probability that the price will be * less than* $500 dollars below the mean? Make sure you interpret your results.

Please note: we are given a new sample size, we will need to calculate a new SD. Then, to find the value that is $500 below the mean you will need to take the mean and subtract $500 from it. For example, if the mean is $15,000 then $500 below this would be $14,500. Thus the probability you would want to find is P(x < 14,500).

For the next 4 cars that are sampled, what is the probability that the price will be * higher than* $1000 dollars above the mean? Make sure you interpret your results. Use the same logic as above. If your mean is $15,000 then $1,000 above is 15,000 + 1,000 = $16,000. Thus the probability you would want to find is P(x > 16,000).

For the next 4 cars that are sampled, what is the probability that the price will be * equal* to the mean? Make sure you interpret your results. Use the same logic as above.

For the next 4 cars that are sampled, what is the probability that the price will be $1500 * within* the mean? Make sure you interpret your results. Use the same logic as above.

I encourage you to review the * Week 4 normal probabilities PDF* at the bottom of the discussion. This will give you a step by step example to follow and show you how to find probabilities using Excel. I also encourage you to review the

*. This will give you a better understanding on how to utilize the empirical rule. You can also use this PDF in the Quizzes section.*

**Week 4 Empirical Rule PDF**There are additional PDFs that were created to help you with the Homework, Lessons and Tests in Quizzes section. While they won't be used to answer the questions in the discussion, they are just as useful and beneficial. I encourage you to review these ASAP!

This week we will discuss the Empirical Rule.

The empirical rule allows you to determine the proximity of the data to the mean.

This only works for bell shape or symmetric distributions.

• The interval that is one standard deviation away contains approximately

68% of the data.

(�̅� ± 1 *SD)

• The interval that is two standard deviations away contains approximately

95% of the data.

(�̅� ± 2 *SD)

• The interval that is three standard deviations away contains approximately

99.7% of the data.

(�̅� ± 3 *SD)

Let’s continue to look at the Data from Week 2.

Car Price:

Observation 1 $ 20,000 Observation 2 $ 25,000

Observation 3 $ 30,000 Observation 4 $ 31,000

Observation 5 $ 22,500

Observation 6 $ 25,000 Observation 7 $ 29,500

Observation 8 $ 24,000 Observation 9 $ 24,500

Observation 10 $ 25,000

Mean: $ 25,650 Median: $ 25,000

SD: $ 3,488.47 Sample Size: 10

Using the Empirical Rule calculate how many data points fall within the 1, 2 and 3

SD’s?

1) First, we will need to calculate each interval.

25, 650 – 3,488.47 = $22,162 -> round to the nearest dollar 25, 650 + 3,488.47 = $29,138 -> round to the nearest dollar The interval for approximal 68% of the data is ($22,162, $29,138). But how many data points fall within this interval? We see that observations, 1, 2, 5, 6, 8 ,9, and 10. 7 of the 10 observations fall

within this interval. That is 7

10 = 70% of the data falls within 1 SD. This is very

close to 68%.

2) We will calculate the next interval.

25, 650 – (2) 3,488.47 = $18,673 -> round to the nearest dollar 25, 650 + (2) 3,488.47 = $32,627 -> round to the nearest dollar The interval for approximal 95% of the data is ($18,673, $32,627). But how many data points fall within this interval? We see that observations, 1, 2, 3, 4 5, 6, 7 8 ,9, and 10. All 10 of the 10

observations fall within this interval. That is 10

10 = 100% of the data falls within 2

SD’s. This is very close to 95%. Since this is a smaller data set see that all the data points fall within the first 2 observations is not uncommon. We would expect results like this.

3) But it is still a good idea to calculate the last interval.

25, 650 – (3) 3,488.47 = $15,185 -> round to the nearest dollar 25, 650 + (3) 3,488.47 = $36,115 -> round to the nearest dollar The interval for approximal 99.7% of the data is ($15,185, $36,115). But how many data points fall within this interval? Just like with the last interval all the data points fall within this interval and because this is a small data set the results are as expected.

There are no data points that fall outside this range. There doesn’t appear to be

any outliers in this data set. We also see that the mean and median are close

together. There isn’t a big difference between the two values. Because of these

explanations, this data appears to be normal and have a normal distribution. The

data set does not seem to be skewed, in either direction.

We can see how the SD’s line up along the x-axis and it creates the bell-shaped

curve.

,

Uniform Probabilities

The uniform distribution is a continuous probability distribution and is concerned

with events that are equally likely to occur. When working out problems that have

a uniform distribution, be careful to note if the data is inclusive or exclusive.

Formula Review

x = is a real number between a and b

In some instances, x can take on the values a and b, if that happens then

a = smallest X; b = largest X

X ~ U (a, b), where a ≤ x ≤ b

The mean is 𝜇 = 𝑎+𝑏

2

The standard deviation is 𝜎 = √ (𝑏−𝑎)2

12

Probability density function: 𝑓(𝑥) = 1

𝑏−𝑎 where a ≤ x ≤ b

Cumulative density function: P(X ≤ x) = 𝑥−𝑎

𝑏−𝑎

Area to the Left of x: P(X < x) = (𝑥 − 𝑎) ( 1

𝑏−𝑎 ) =

𝑥−𝑎

𝑏−𝑎

Area to the Right of x: P(X > x) = (𝑏 − 𝑥) ( 1

𝑏−𝑎 ) =

𝑏−𝑥

𝑏−𝑎

Area Between c and d: P(c < x < d) = (base)(height) = (𝑑 − 𝑐) ( 1

𝑏−𝑎 ) =

𝑑−𝑐

𝑏−𝑎

Note: (d – c) is the base and ( 1

𝑏−𝑎 ) is the height

Example:

Researchers have developed a safe method for rapidly detecting anthrax spores in

powders and on surfaces. The method has been found to work well even when

there are very few anthrax sports in a powered specimen. Consider a powder

specimen has exactly 30 anthrax spores. Supposed that the number of anthrax

spores in the sample detected by the new method follows a uniform distribution

from 10 to 30. Find the following probabilities.

First, we see that a = 10 and b = 30.

1) Find f(x)

𝑓(𝑥) = 1

𝑏 − 𝑎

𝑓(𝑥) = 1

30 − 10 =

1

20 = .05

2) Find the mean and standard deviation

𝜇 = 𝑎 + 𝑏

2

𝜇 = 10 + 30

2 = 20

𝜎 = √ (𝑏 − 𝑎)2

12

𝜎 = √ (30 − 10)2

12 = √

400

12 = 5.7735

3) Find the probability that 22 or fewer anthrax spores are detected in the

powdered specimen.

P(X ≤ x) = 𝑥−𝑎

𝑏−𝑎

P(X ≤ 22) = 22−10

30−10 =

12

20 = .6

4) Find the probability that between 10 and 25 anthrax spores are detected in the

powdered specimen.

P(c < x < d) = (base)(height) = (𝑑 − 𝑐) ( 1

𝑏−𝑎 )

P(10 < x < 25) = (base)(height) = (25 − 10) ( 1

30−10 ) = 15 ∗ (

1

20 ) = .75

5) Find the probability that fewer than 13 anthrax spores are detected in the

powdered specimen.

P(X < x) = (𝑥 − 𝑎)( 1

𝑏−𝑎 )

P(X < 13) = (13 − 10) ( 1

30−10 ) = 3 (

1

20 ) = .15

6) Find the probability that more than 26 anthrax spores are detected in the

powdered specimen.

P(x > x) = (𝑏 − 𝑥)( 1

𝑏−𝑎 )

P(x > 26) = (30 − 26) ( 1

30−10 ) = 4 (

1

20 ) = .20

7) Find the probability that more than 19 anthrax spores are detected given that

12 anthrax spores have already been detected in the powdered specimen.

P(x > 19 | x > 12) = 𝑃(𝑥>19)

𝑃(𝑥>12)

From here, we will use this probability twice and then divide the two answers.

P(x > x) = (𝑏 − 𝑥)( 1

𝑏−𝑎 )

P(x > 19) = (30 − 19) ( 1

30−10 ) = 11 (

1

20 ) = .55

P(x > x) = (𝑏 − 𝑥)( 1

𝑏−𝑎 )

P(x > 12) = (30 − 12) ( 1

30−10 ) = 18 (

1

20 ) = .9

P(x > 19 | x > 12) = 𝑃(𝑥>19)

𝑃(𝑥>12) =

.55

.9 = .6111

8) 78% of all anthrax spores that are detected in the powdered specimen fall

below the 78th percentile. Find x.

P(X < x) = .78

Using this equation,

P(X < x) = 𝑥−𝑎

𝑏−𝑎

.78 = 𝑥−𝑎

𝑏−𝑎

.78 = 𝑥−10

30−10

.78 = 𝑥−10

20

.78*20 = x – 10

15.6 = x – 10

15.6 + 10 = x

25.6 = x

The 78th percentile of all anthrax spores detected by the powdered specimen is

25.6.

9) 33% of all anthrax spores that are detected in the powdered specimen fall

above the 33rd percentile. Find x.

P(X > x ) = .33

P(X > x) = 𝑏−𝑥

𝑏−𝑎

.33 = 𝑏−𝑥

𝑏−𝑎

.33 = 30−𝑥

30−10

.33 = 30−𝑥

20

.33*20 = 30 – x

6.6 – 30 = -x

-23.4 = – x

23.4 = x

The upper 33rd percentile of all anthrax spores detected by the powdered

specimen is 23.4.

,

Exponential Probabilities

The exponential distribution is often concerned with the amount of time until

some specific event occurs.

For this reason, the exponential distribution is sometimes called the waiting-time

distribution.

You need to rewrite the probabilities in the less than form to use the function in

EXCEL. We will use Excel to find Exponential Probabilities. The probabilities do

need to be in the less than form to use Excel. This is very important.

The Less Than or Equal To Form is not as important because exponential

probabilities are continuous not discrete. But the probabilities do need to be in

the less than form to use Excel.

• P( x = r) • P( x ≤ r) same as P(x < r) • P( x ≥ r) = 1 – P(x < r ) • P(x > r) = 1 – P(x < r) • P(r < x < k) = P(x < k) – P(x < r)

• Expected Value is µ or 𝜆

• For Exponential Distributions using the Excel function you will take 1

𝜆 , and

use this value in the Excel function • r and k are the number of occurrences

To find Exponential Probabilities we will use the =EXPON.DIST( ) function.

To find a certain percentage using Exponential Distribution you will use the

following equation

− ln(1 − 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)

1 𝜆

You will use the natural log function (ln), the percentage you are looking for and

then 1

𝜆 in the denominator.

Note: this is the same 1

𝜆 that you use in the Excel function.

Example:

A specific species of trees, the western hemlock, was found to have a breast

height diameter distribution that resembled an exponential distribution with

𝜆 = 30.

1) Find the probability that a western hemlock tree growing in the forest has a

diameter that is exactly 23 centimeters in length?

Because of the word “exactly” we want to find this probability P(x = 23). We will

use the EXPON.DIST() function to find this probability.

In Excel you can take =1/30 = .03333. We will use this value in your Excel

function.

P(x = 23) = EXPON.DIST(23,.03333, FALSE)

In Excel make sure you hit the “=“ sign first then start typing in EXPON.DIST( From

here make sure you include the left parenthesis then type in the x value, then

1/lambda, then either TRUE or FALSE. Then close the parenthesis ) and hit Enter.

Type in a TRUE when you have a less than or equal to probability and type in a

FALSE when you have an equals probability. This example has an “=“ sign so we

will use a FALSE.

There is a 1.55% probability that a western hemlock tree growing in the forest has

a diameter that is exactly 23 centimeters in length.

Note: When you hit “Enter” the answer will return as a decimal, .0155. You will

then need to convert it to a percent.

2) Find the probability that a western hemlock tree growing in the forest has a

diameter that is less than 27 centimeters in length?

Because of the word “less than” we will use the less than sign.

This is the probability we want to find, P(x < 27) or P(x ≤ 27)

P(x < 27) = EXPON.DIST(27,.0333,TRUE)

In Excel make sure you hit the “=“ sign first then start typing in EXPON.DIST(.

From here make sure you include the left parenthesis then type in the x value,

then 1/lambda, then either TRUE or FALSE. Then close the parenthesis ) and hit

Enter.

Type in a TRUE when you have a less than probability and type in a FALSE when

you have an equals probability. This example has an “<“ sign so we will use a

TRUE.

There is an 59.34% probability that a western hemlock tree growing in the forest

has a diameter that is less than 27 centimeters in length.

Note: When you hit “Enter” the answer will return as a decimal, .5934. You will

then need to convert it to a percent.

3) Find the probability that a western hemlock tree growing in the forest has a

diameter that is exceeds 25 centimeters in length?

Because of the words “exceeds” we will use the greater than or equal to sign.

This is the probability we want to find, P(x > 25).

This probability is in the greater than or equal to form NOT the less than so we

need to rewrite this in the less than or equal to form.

Remember: P( x > r) = 1 – P( x ≤ r )

P( x > 25) = 1 – P(x ≤ 25). Now that the probability is in the less than form we can

use Excel.

1 – P(x ≤ 25) = 1- EXPON.DIST(25,.0333, TRUE)

In Excel make sure you hit the “=“ sign first, then the 1 – and then, EXPON.DIST(.

From here make sure you include the left parenthesis then type in the x value,

then 1/lambda, then either TRUE or FALSE. Then close the parenthesis ) and hit

Enter.

Type in a TRUE when you have a less than probability and type in a FALSE when

you have an equals probability. This example has an “<“ sign so we will use a

TRUE.

There is a 43.46% probability that a western hemlock tree growing in the forest

has a diameter that is exceeds 25 centimeters in length.

Note: When you hit “Enter” the answer will return as a decimal, .4346. You will

then need to convert it to a percent.

4) Find the probability that a western hemlock tree growing in the forest has a

diameter that is between 28 and 34 centimeters in length?

Because of the word “between” we will this probability P(r < x < k) = P(x < k) – P(x < r) This is the probability we want to find, P(28 < x < 34).

P(28 < x < 34) = P(x < 34) – P(x < 28)

Now that the probability is in the less than form we can use Excel.

P(x < 34) – P(x < 28) = EXPON.DIST(34,.0333,TRUE) – EXPON.DIST(28,.0333,TRUE)

In Excel make sure you hit the “=“ sign first, then EXPON.DIST(. From here make

sure you include the left parenthesis then type in the x value, then 1/lambda,

then TRUE, then close the parenthesis ) and hit the minus sign “ –“ and then

repeat the steps and hit Enter.

Type in a TRUE when you have a less than probability and type in a FALSE when

you have an equals probability. This example has an “<“ sign so we will use a

TRUE.

There is a 7.13% probability that a western hemlock tree growing in the forest has

a diameter that is between 28 and 34 centimeters in length.

Note: When you hit “Enter” the answer will return as a decimal, .0713. You will

then need to convert it to a percent.

63% of all trees will have a diameter of how many centimeters in length?

We will use this equation and plug this equation into Excel:

− ln(1 − 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)

1 𝜆

There is an LN function in Excel and the percentage we will use is .63

63% of all trees will have a diameter of 29.8276 centimeters.

If you want to plug this in and calculate this by hand. But I find Excel a lot easier.

− ln(1 − 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)

1 𝜆

− ln(1 − .63)

1 30

,

Normal Distributions are the most common distributions in statistics. If a random variable X is normally distributed with a mean μ and a standard deviation σ.

X ~ N(μ, σ) ; Z ~ N(0, 1) Normal distributions are known as “bell-shaped curve” To find the probabilities of normal distributions using a Normal Distribution Table, we would start by converting the x values to a standard normal z-curve. The equation of the z – score;

𝑧 = 𝑥 − 𝜇

𝜎

Nowadays, we do not need to do this conversion to the standard normal distribution, since Excel does it automatically for us. Excel can only find Less Than probabilities, therefore it is important to make sure that your problem is only including the less than inequality (<). Less Than OR Less Than and Equal To is not as important because normal probabilities are continuous not discrete. Here are some common Normal Probabilities and how they would get re-written to calculate in the less than form, to use Excel.

• P(X ≤ j) same as P( X < j) • P( X ≥ j) same as 1 – P(X < j ) • P(j < X < k) = P(X < k) – P(X < j) • Expected Value = µ (Mean) • Standard deviation = σ (SD)

To find Normal Probabilities we will use the =NORM.DIST( ) function. The Central Limit Theorem states that given any distribution with a mean μ and a standard deviation of σ, the sample mean will approach a normal distribution as the sample size, n, increases. The new mean of the sample mean will equal the old mean; new μ = old μ and the new standard deviation of the sample mean (this is also called standard error) will be written as;

𝒏𝒆𝒘 𝒔𝒅 (or standard error) = 𝜎

√𝑛

Let’s use our Car Price Data from Week 1 and calculate 4 different probabilities.

Car Price:

Observation 1 $ 20,000 Observation 2 $ 25,000

Observation 3 $ 30,000 Observation 4 $ 31,000

Observation 5 $ 22,500

Observation 6 $ 25,000 Observation 7 $ 29,500

Observation 8 $ 24,000 Observation 9 $ 24,500

Observation 10 $ 25,000 1. Using our data, we believe that the cost of the type of car we calculated is normally distributed with a mean of $25,650 and a SD of $3,488.47. Assume that 5 additional cars are randomly sampled, and their prices are recorded. What is the probability that the sample mean price of the 5 new cars will be less than $24,000? The probability is already in the less than form, P(�̅� < 24,000), so we do not need to do additional work in Excel to find the probability. We also notice that the new sample size is n = 5. The mean will stay the same, but we will need to calculate a new SD. We will apply the Central Limit Theorem to do this. Remember you need to put in the “=” sign and then we will click on the cell that contains the old SD, and will hit the “ / “ sign and then use the SQRT( ) function and put 5 within the parentheses because the new sample size is 5.

𝒏𝒆𝒘 𝒔𝒅 = 𝜎

√𝑛 =

3488.47

√5 = 1560.09

Next, we want to find this probability P(�̅� < 24,000) and we will use the NORM.DIST() function in Excel to do this. P(�̅� < 24000) = NORM.DIST(24000, 25650, 1560.09, true)

In Excel make sure you hit the “=“ sign first, then start typing in NORM.DIST(. From here make sure you include the left parenthesis then type in the x value, the mean, the standard deviation, then either True. Then close the parenthesis ) and hit Enter. ALWAYS type in a True for continuous probability functions (the normal distribution is continuous). This example has an “<“ sign so we will use a True.

The probability that the sample mean for the new sample of 5 cars is below $24,000 is 14.51%. Remember: Once you hit “Enter” the answer returns a decimal. You need to convert it to a percentage if you want to read a percentage.

2. Assume that 5 additional cars are randomly sampled, and their prices are recorded. What is the probability that the sample mean price of the 5 new cars will be higher than $25,000? Because of the words “higher than”, we want to find this probability P(�̅� > 25,000). Since we are using the same data the mean and the new SD will be the same. Remember the function in Excel are in the less than form. This means we will need to do an extra step in Excel to get the probability we want. P(�̅� > 25,000) = 1 – NORM.DIST(25000,25650,1560.09,TRUE) In Excel make sure you hit the “=“ sign first, then the 1 – and then start typing in NORM.DIST(. From here make sure you include the left parenthesis then type in the x value, the mean, the standard deviation, then either True. Then close the parenthesis ) and hit Enter.

The probability that the sample mean for the new sample of 5 cars is below $25,000 is 66.15%. Remember: Once you hit “Enter” the answer returns a decimal. You need to convert it to a percentage if you want to read a percentage. 3. Assume that 5 additional cars are randomly sampled, and their prices are recorded. What is the probability that the sample mean price of the 5 new cars will be between $24,000 and $25,000? Because of the word “between”, we want to find this probability P(24000 < �̅�< 25000). Since we are using the same data the mean and the new SD will be the same. Remember the function in Excel are in the less than form. This means we will need to do an extra step in Excel to get the probability we want. P(24000 < �̅� < 25000) = P(�̅� < 25000) – P(�̅� < 24000) = NORM.DIST(25000, 25650, 1560.09,TRUE) – NORM.DIST(24000,25650,1560.09,TRUE)

In Excel make sure you hit the “=“ sign first, then start typing in NORM.DIST(. From here make sure you include the left parenthesis then type in the x value, the mean, the standard deviation, then either True. Then close the parenthesis ), hit the minus – sign then Repeat and then hit Enter.

The probability that the sample mean for the new sample of 5 cars is between $24,000 and $25,000 is 19.34%. Remember: Once you hit “Enter” the answer returns a decimal. You need to convert it to a percentage if you want to read a percentage.