The slope ( b) can be written as b = r ( s y s x ) b = r ( s y s x ) where s y = the standard deviation of the y values and s x = the standard deviation of the x values. The best-fit line always passes through the point ( x ¯, y ¯ ) ( x ¯, y ¯ ). The sample means of the x values and the y values are x ¯ x ¯ and y ¯ y ¯, respectively. It turns out that the line of best fit has the equationĪnd b = ∑ ( x − x ¯ ) ( y − y ¯ ) ∑ ( x − x ¯ ) 2 b = ∑ ( x − x ¯ ) ( y − y ¯ ) ∑ ( x − x ¯ ) 2. When you make the SSE a minimum, you have determined the points that are on the line of best fit. Using calculus, you can determine the values of a and b that make the SSE a minimum. This is called the sum of squared errors (SSE). If you square each ε and add them, you get the sum of ε squared from i = 1 to i = 11, as shown below. , 11.įor the example about the third exam scores and the final exam scores for the 11 statistics students, there are 11 data points. Here the point lies above the line and the residual is positive.įor each data point, you can calculate the residuals or errors, y i – ŷ i = ε i for i = 1, 2, 3. In Figure 12.6, y 0 – ŷ 0 = ε 0 is the residual for the point shown. If the observed data point lies below the line, the residual is negative and the line overestimates that actual data value for y. If the observed data point lies above the line, the residual is positive and the line underestimates the actual data value for y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line, or it measures how far the estimate is from the actual data value. The absolute value of a residual measures the vertical distance between the actual value of y and the estimated value of y. It is not an error in the sense of a mistake. The term y 0 – ŷ 0 = ε 0 is called the error or residual. It is not generally equal to y from data, but it is still important because it can help make predictions for other values. It is the value of y obtained using the regression line. The ŷ is read y hat and is the estimated value of y. Each point of data is of the the form ( x, y), and each point of the line of best fit using least-squares linear regression has the form ( x, ŷ). Rounding to the nearest tenth, the calculator gives the –median-median line of y = 6.9 x − 315.5. The calculator shows a slight deviation from the previous manual calculation as a result of rounding. The slope, a, and y-intercept, b, will be provided. You can enter the x and y values into two separate lists choose Stat, Calc, Med-Med, and press Enter. The median–median line may also be found using your graphing calculator. Thus, the equation can be written as y = 6.9 x − 316.3. The line of best fit is represented as y = m x + b. Substituting these sums and the slope into the formula gives b = 476 − 6.9 ( 206.5 ) 3 b = 476 − 6.9 ( 206.5 ) 3, which simplifies to b ≈ − 316.3. The sum of the median x values is 206.5, and the sum of the median y values is 476. The y-intercept may be found using the formula b = Σ y − m Σ x 3 b = Σ y − m Σ x 3, which means the quantity of the sum of the median y values minus the slope times the sum of the median x values divided by three. Substituting the median x and y values from the first and third groups gives m = 174 − 143 71 − 66.5, m = 174 − 143 71 − 66.5, which simplifies to m ≈ 6.9. The slope can be calculated using the formula m − y 2 − y 1 x 2 − x 1. This allows us to find the slope and y-intercept of the –median-median line. When this is completed, we can write the ordered pairs for the median values. Table 12.3 shows the correct ordering of the x values but does not show a reordering of the y values. However, to find the median, we first must rearrange the y values in each group from the least value to the greatest value. The corresponding y values are then recorded. We must remember first to put the x values in ascending order. The first and third groups have the same number of x values. We first divide our scores into three groups of approximately equal numbers of x values per group. If multiple data points have the same y values, then they are listed in order from least to greatest y (see data values where x = 71). Remember that this is the data from Example 12.5 after the ordered pairs have been listed by ordering x values. Let'’s first find the line of best fit for the relationship between the third exam score and the final exam score using the median-median line approach. We can obtain a line of best fit using either the median-–median line approach or by calculating the least-squares regression line. If each of you were to fit a line by eye, you would draw different lines. We will plot a regression line that best fits the data. The third exam score, x, is the independent variable, and the final exam score, y, is the dependent variable.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |