PART II Foundamentals of Regression Analysis

Chapter 4 Linear Regression with One Regressor

4.1 The Linear Regreesion Model

dependent variable

independent variable or regressor

population regression line or population regression function

intercept β0

slope β1

error term

4.2 Estimating the Coefficients of the Linear Regression Model

We must use data to estimate the unknown slope and intercept of the population regression line.

The Ordinary Least Squares Estimator

The OLS estimator chooses the regression coefficients so that the estimated regression line is as close as possible to the observed data, where closeness is measured by the sum of the squared mistakes made in predicting Y given X.

OLS regression line, sample regression line, sample regression function

predict value

residual

OLS Estimates of the Relationship Between Test Scores and the Student-Teacher Ratio

Why Use the OLS Estimator?

practical: OLS is the dominant method used in practice. Presenting results using OLS means that you are “speaking the same language” as other economists and statisticians.

theoretical: unbiased and consistent, efficient

4.3 Measures of Fit

How well that regression line describes the data?

The R^2

The regression R^2 is the fraction of the sample variance of Yi explained by (or predicted by) Xi.

explained sum of squares (ESS)

total sum of squares (TSS)

sum of squared residuals (SSR)

The Standard Error of the Regression

The Standard Error of the Regression (SER) is an estimator of the standard deviation of the regression error ui.

The units of ui and Yi are the same, so the SER is a measure of the spread of the observations around the regression line, measured in the units of the dependent variable.

SER = SSR/(n-2)

Application to the Test Score Data

The fact that R^2 of this regression is low (and the SER is large) does not, by itself, imply that this regression is either “good” or “bad”. What the low R^2 does tell us is that other important factors influence test scores.

4.4 The Least Squares Assumptions

Assumption #1: The Conditional Distribution of ui Given Xi Has a Mean of Zero

This assumption is a formal mathematical statement about the “other factors” contained in ui and asserts that these other factors are unrelated to Xi in the sense that, given a value of Xi, the mean of the distribution of these other factor is zero.

The conditional mean of u in a randomized controlled experiment. In a randomized controlled experiment, subjects are randomly assigned to the treatment group (X=1) or to the control group (X=0). The random assignment typically is done using a computer program that uses no information about the subject, ensuring that X is distributed independently of all personal characteristics of the subject. Random assignment makes X and u independent, which in turn implies that the conditional mean of u given X is zero.

In observational data, X is not randomly assigned in an experiment. Instead, the best that can be hoped for is that X is as if randomly assigned, in the precise sense that E(ui|Xi) = 0. Whether this assumption holds in a given empirical application with observational data requires careful thought and judgement.

Correlation and conditional mean. if the conditional mean of one random variable given another is zero, then the two random variables have zero covariance and thus are uncorrelated. However, if Xi and ui are correlated, then it must be the case that E(ui|Xi) is nonzeor.

Assumption #2: (Xi, Yi), i=1,…,n, Are Independently and Identically Distributed

This assumption is a statement about how the sample is drawn. If the observations are drawn by simple random sampling from a single large population, then (Xi, Yi), i=1,…,n are i.i.d.

When the values of X are not drawn from a random sample of the population but rather are set by a researcher as part of an experiment. Thus Xi is nonrandom, so the sampling scheme is not i.i.d. The results presented in this chapter developed for i.i.d. regressors are also true fi the regressors are nonrandom. When modern experimental protocal is used, the level of X is random and (Xi, Yi) are i.i.d.

Another example of non-i.i.d. sampling is when observations refer to the same unit of observation over time. This is an example of time series data, and a key feature of time series data is that observations falling close to each other in time are not independent but rather tend to be correlated with each other. This pattern of correlation violates the “independence” part of the i.i.d. assumption.

Assumption #3: Large Outliers Are Unlikely

Large outliers can make OLS regression results misleading.

The assumption that large outliers are unlikely is made mathematically precise by assuming that X and Y have nonzero finite fourth moments: 0<E(Xi^4)<∞ and 0<E(Yi^4)<∞. Another way to state this assumption is that X and Y have finite kurtosis.

One source of large outliers is data entry errors.

Use of the Least Squares Assumptions

The least squares assumptions play twin roles

mathematical role: If these assumptions hold, then, as is shown in the next section, in large samples the OLS estimators have sampling distributions that are normal. In turn, this large-sample normal distribution let us develop methods for hypothesis testing and constructing confidence intervals using the OLS estimators.

organize the circumstances role: Their second role is to organize the circumstances that pose difficulties for OLS regression. As we will see, the first least squares assumption is the most important to consider in practice. It is also important to consider whether the second assumption holds in an application. Although it plausibly holds in many cross-sectional data sets, the independence assumption is inappropriate for panel and time series data. Therefore, the regression methods developed under assumption 2 require modification for some applications with time series data. The third assumption serves as a reminder that OLS, just like the sample mean, can be sensitive to large outliers. If your data set contains large outliers, you should examine those outliers carefully to make sure those observations are correctly recorded and belong in the data set.

4.5 Sampling Distribution of the OLS Estimators

Because the OLS estimators β0-hat and β1-hat are computed from a randomly drawn sample, the estimators themselves are random variables with a probability distribution - the sampling distribution - that describes the values they could take over different possible random samples.

In small samples, these distributions are complicated, but in large samples, they are approximately normal because of the central limit theorem.

The Sampling Distribution of the OLS Estimators

Review of the sampling distribution of Y-bar. E(Y-bar) = μY

The sampling distribution of β0-hat and β1-hat. E(β0-hat)=β0, E(β1-hat)=β1. unbiased, consistent

In general, the larger is the variance of Xi, the smaller is the variance σ_β1-hat^2 of β1-hat^2.

the smaller is the variance of the error ui, the smaller is the variance of β1-hat.

4.6 Conclusion

The results in this chapter describe the sampling distribution of the OLS estimator. By themselves, however, these results are not sufficient to test a hypothesis about the value of β1 or to construct a confidence interval for β1. Doing so requires an estimator of the standard deviation of the sampling distribution - that is, the standard error of the OLS estimator.

Chapter 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

In this chapter, we show how knowledge of this sampling distribution can be used to make statements about β1 that accurately summarize the sampling uncertainty. The starting point is the standard error of the OLS estimator, which measures the spread of the sampling distribution of β1-hat.

5.1 Testing Hypotheses About One of the Regression Coefficients

Two-Sided Hypotheses Concerning β1

The general approach to testing hypotheses about the coefficient β1 is the same as to testing hypotheses about the population mean.

Testing hypoteses about the population mean:

H0: E(Y)=μY,0 vs. H1: E(Y) != μY,0

The test of null hypothesis H0 against the two-sided alternative proceeds as in the three steps. The first is to compute the standard error of Y-bar, SE(Y-bar), which is an estimator of the standard deviation of the sampling distribution of Y-bar. The second step is to compute the t-statistic. The third step is to compute the p-value, which is the smallest significance level at which the null hypothesis could be rejected, based on the test statistic actually observed.

Testing hypotheses about the slope β1

The null hypothesis and the two-sided alternative hypothesis are H0:β1-hat = β1,0 vs. H1:β1-hat != β1,0

The first step is to compute the standard error of β1-hat, SE(β1-hat).

The second step is to compute the t-statistic.

The third step is to compute the p-value.

Reporting regression equations and application to test scores.

One-Sided Hypotheses Concerning β1

Sometimes, however, it is appropriate to use a one-sided hypothesis test.

For a one-sided test, the null hypothesis and the one-sided alternative hypothesis are

H0: β1 = β1,0 vs. H1: β1 < β1,0 (one-sided alternative)