# Sections 6.7, 6.8, 7.7 (Note: The approach used here to present the material in these sections is substantially different from the approach used in the

• Published on
24-Dec-2015

• View
213

0

Embed Size (px)

Transcript

• Slide 1
• Sections 6.7, 6.8, 7.7 (Note: The approach used here to present the material in these sections is substantially different from the approach used in the textbook.) Recall: If X and Y are random variables with E(X) = X, E(Y) = Y, Var(X) = X 2, Var(Y) = Y 2, and Cov(X,Y) = X Y, then the least squares line for predicting Y from X is y = Y + Y (x X ) X or y = Y Y X + X Y x X a b The least squares line is derived in Section 4.2 by minimizing E{[Y (a + bX)] 2 }. Consider a set of observed data (x 1, y 1 ), (x 2, y 2 ), , (x n, y n ). Imagine that we treat this data as describing a joint p.m.f. for two random variables X and Y where each points is assigned a probability of 1/n. Then, we see that y x
• Slide 2
• plays the role of E(X) = X, plays the role of E(Y) = Y, plays the role of Var(X) = X 2, plays the role of Var(Y) = Y 2, and plays the role of Cov(X,Y) = X Y. i = 1 x i = 1 n x n i = 1 y i = 1 n y n i = 1 (x i x) 2 = sx2sx2 n 1 n n 1 n i = 1 (y i y) 2 = sy2sy2 n 1 n n 1 n i = 1 (x i x)(y i y) = n 1 n we shall complete this equation shortly.
• Slide 3
• We define the sample covariance to be c =,and we define the sample correlation to be r = Consequently, the least squares line for predicting Y from X is This least squares line minimizes i = 1 (x i x)(y i y) n n 1 c . s x s y The sample correlation r is a measure of the strength and direction of a linear relationship for the sample in the same way that the correlation is a measure of the strength and direction of a linear relationship for the two random variables X and Y.
• Slide 4
• plays the role of E(X) = X, plays the role of E(Y) = Y, plays the role of Var(X) = X 2, plays the role of Var(Y) = Y 2, and plays the role of Cov(X,Y) = X Y. i = 1 x i = 1 n x n i = 1 y i = 1 n y n i = 1 (x i x) 2 = sx2sx2 n 1 n n 1 n i = 1 (y i y) 2 = sy2sy2 n 1 n n 1 n i = 1 (x i x)(y i y) = n 1 n n 1 c n
• Slide 5
• We define the sample covariance to be c =,and we define the sample correlation to be r = Consequently, the least squares line for predicting Y from X is This least squares line minimizes y = y + r s y (x x) s x or y = y r s y x + s x s y r x s x i = 1 (x i x)(y i y) n n 1 c . s x s y The sample correlation r is a measure of the strength and direction of a linear relationship for the sample in the same way that the correlation is a measure of the strength and direction of a linear relationship for the two random variables X and Y. [y i (a + bx i )] 2. i = 1 n a b
• Slide 6
• r = +1r close to +1r is positive r = 1r close to 1 r is negative r close to 0 r is negativer close to 0
• Slide 7
• Suppose Y 1, Y 2, , Y n are independent with respective N( 1, 2 ), N( 2, 2 ), , N( n, 2 ) distributions. Let x 1, x 2, , x n be fixed values not all equal, and suppose that for i = 1, 2, , n, i = 0 + 1 x i. Then the joint p.d.f. of Y 1, Y 2, , Y n is exp = 2 [y i ( 0 + 1 x i )] 2 2 2 n i = 1 exp n (2 ) n/2 [y i ( 0 + 1 x i )] 2 2 2 i = 1 n for < y 1 < , < y 2 < , , < y n < If we treat this joint p.d.f. as a function L( 0, 1 ), that is, a function of the unknown parameters 0 and 1, then we can find the maximum likelihood estimates for 0 and 1 by maximizing the function L( 0, 1 ). It is clear the function L( 0, 1 ) will be maximized when is minimized. [y i ( 0 + 1 x i )] 2 i = 1 n
• Slide 8
• The previous result concerning the least squares line for predicting Y from X with a sample of data points tells us that the mle of 1 is 1 = and the mle of 0 is 0 = ^ ^ S y R = s x i = 1 (x i x)(Y i Y) n i = 1 (x i x) 2 n i = 1 (x i x)Y i n i = 1 (x i x) 2 n = = n j = 1 (x j x) 2 n i = 1 (x i x) YiYi Y R S y x = s x i = 1 n Y i n n j = 1 (x j x) 2 n i = 1 (x i x) YiYi = i = 1 n 1 n j = 1 (x j x) 2 n (x i x) YiYi x x
• Slide 9
• 1. (a) Suppose we are interested in predicting a person's height from the person's length of stride (distance between footprints). The following data is recorded for a random sample of 5 people: Length of Stride (inches)14 13 21 25 17 Height (inches)61 54 63 72 59 Find the equation of the least squares line for predicting a person's height from the person's length of stride. The slope of the least squares line is 120 = 1.2. 100 The intercept of the least squares line is 61.8 (1.2)(18) = 40.2. The least squares line can be written y = 40.2 + 1.2x.
• Slide 10
• (b)Suppose we assume that the height of humans has a normal distribution with mean 0 + 1 x and variance 2, where x is the length of stride. Find the maximum likelihood estimators for 0 and 1. The mle of 1 is 120 = 1.2. 100 The mle of 0 is 61.8 (1.2)(18) = 40.2.
• Slide 11
• 2. (a) Use Theorem 5.5-1 (Class Exercise 5.5-1) to find the distribution of the maximum likelihood estimator of 1. Suppose Y 1, Y 2, , Y n are independent with respective N( 1, 2 ), N( 2, 2 ), , N( n, 2 ) distributions. Let x 1, x 2, , x n be fixed values not all equal, and suppose that for i = 1, 2, , n, i = 0 + 1 x i. n j = 1 (x j x) 2 n i = 1 (x i x) YiYi 1 = ^ has a normal distribution with mean n j = 1 (x j x) 2 n i = 1 (x i x) ( 0 + 1 x i ) = n j = 1 (x j x) 2 n i = 1 (x i x) [ 0 + 1 x + 1 (x i x)] = n j = 1 (x j x) 2 n i = 1 (x i x) [ 0 + 1 x] + n j = 1 (x j x) 2 n i = 1 (x i x) 1 (x i x) = 11 n j = 1 (x j x) 2 n i = 1 (x i x) 2 = 1
• Slide 12
• and variance n j = 1 (x j x) 2 n i = 1 (x i x) 2 = 2 22 n j = 1 (x j x) 2 n i = 1 (x i x) 2 2 = 22 i = 1 n
• Slide 13
• 2. - continued (b) Use Theorem 5.5-1 (Class Exercise 5.5-1) to find the distribution of the maximum likelihood estimator of 0. 0 = ^ i = 1 n 1 n j = 1 (x j x) 2 n (x i x) YiYi x has a normal distribution with mean i = 1 n 1 n j = 1 (x j x) 2 n (x i x) x ( 0 + 1 x i ) = i = 1 n 1 n j = 1 (x j x) 2 n (x i x) x ( 0 + 1 x i ) = i = 1 n ( 0 + 1 x i ) 0 + 1 x x n (x j x) 2 n i = 1 (x i x) ( 0 + 1 x i ) = j = 1 We already found this in part (a).
• Slide 14
• 0 + 1 x 1 x = 00 and variance i = 1 n 1 n j = 1 (x j x) 2 n (x i x) x 22 2 = i = 1 n 1 n 2 22 j = 1 (x j x) 2 n 2 (x i x) x n j = 1 (x j x) 2 n (x i x) 2 2 + x2x2 = 22 1 n + x2x2 (x i x) 2 i = 1 n
• Slide 15
• Suppose we treat the joint p.d.f. as a function L( 0, 1, 2 ), that is, a function of the three unknown parameters (instead of just two). Then, analogous to Text Example 6.1-3, we find that the maximum likelihood estimates for 0 and 1 are the same as previously derived, and that the maximum likelihood estimator for 2 is [Y i ( 0 + 1 x i )] 2 i = 1 n n ^^ Recall: If Y 1, Y 2, , Y n are independent with each having a N( , 2 ) distribution (i.e., a random sample from a N( , 2 ) distribution), then Y =has adistribution, has adistribution, and i = 1 YiYi n n N( , ) 2 n2 n (n 1)S 2 2 2 (n 1)
• Slide 16
• the random variables Y and are (n 1)S 2 2 independent. Analogous results for the more general situation previously considered can be proven using matrix algebra. Suppose Y 1, Y 2, , Y n are independent with respective N( 1, 2 ), N( 2, 2 ), , N( n, 2 ) distributions. Let x 1, x 2, , x n be fixed values not all equal, and suppose that for i = 1, 2, , n, i = 0 + 1 x i. Then 1 has a distribution, 0 has a distribution, ^ N( 1, ) 22 (x i x) 2 ^ N( 0,) i = 1 n 22 1 n + x2x2 (x i x) 2 i = 1 n
• Slide 17
• has adistribution, random variablesand are independent, and random variablesand are independent. [Y i ( 0 + 1 x i )] 2 i = 1 n 22 ^^ 2 (n 2) 11 ^ [Y i ( 0 + 1 x i )] 2 i = 1 n 22 ^^ 00 ^ [Y i ( 0 + 1 x i )] 2 i = 1 n 22 ^^
• Slide 18
• (Y i Y) 2 = n (Y i Y) 2 + i = 1 n ^ (Yi Yi)2(Yi Yi)2 n ^ This is called the total sum of squares and is denoted SST. This is called the regression sum of squares and is denoted SSR. This is called the error (residual) sum of squares and is denoted SSE. Since, as we have noted, SSE / 2 has a 2 (n 2) distribution, we say that the df (degrees of freedom) associated with SSE is n 2. If Y 1, Y 2, , Y n all have the same mean, that is, if 1 = 0, then SST / 2 has a 2 (n 1) distribution; consequently, the df associated with SST is n 1. If Y 1, Y 2, , Y n all have the same mean, that is, if 1 = 0, then it can be shown that SSR and SSE are independent, and that SSR / 2 has a 2 (1) distribution; consequently, the df associated with SSR is 1. For each i = 1, 2, , n, we define the random variable Y i = 0 + 1 x i, that is, Y i is the predicted value corresponding to x i. With appropriate algebra, it can be shown that ^^^
• Slide 19
• 3. Suppose Y 1, Y 2, , Y n are independent with respective N( 1, 2 ), N( 2, 2 ), , N( n, 2 ) distributions. Let x 1, x 2, , x n be fixed values not all equal, and suppose that for i = 1, 2, , n, i = 0 + 1 x i. Prove that SST = SSR + SSE. First, we observe that for i = 1, 2, , n, Y i = 0 + 1 x i = ^^^ Y + 1 (x i x) ^ SST = (Y i Y) 2 = n i = 1 n 1 (x i x) + (Y i Y) 1 (x i x) = ^^ 2 n i = 1 [ 1 (x i x)] 2 + ^ [(Y i Y) 1 (x i x)] 2 + ^ 2 1 (x i x)[(Y i Y) 1 (x i x)] = ^^ n i = 1 ^ [ 1 (x i x)] 2 + n i = 1 [Y i Y 1 (x i x)] 2 + ^ 2 1 (x i x)(Y i Y) 1 2 (x i x) 2 = ^^ n i = 1 n
• Slide 20
• (Y i Y) 2 + i = 1 n ^ (Y i Y i ) 2 + i = 1 n ^ 2 1 (x i x)