EF5470 Econometrics Part I (Fred Y. K. Kwan)

Textbook: Stock, J. and Watson, M. (2012) Introduction to Econometrics. 3rd edition. Pearson.

Part I Introduction to Econometrics

Chapter 1 Economic Questions and Data

At a broad level, econometrics is the science and art of using economic theory and statistical techniques to analyze economic data.

1.1 Economic Questions We Examine

Four questions as example

  • Question #1: Does Reducing Class Size Improve Elementray School Education?
  • Question #2: Is There Racial Discrimination in the Market for Home Loans?
  • Question #3: How Much Do Cigarette Taxes Reduce Smoking?
  • Question #4: By How Much Will U.S. GDP Grow Next Year?

Each of these questions requires a numerical answer. Economic theory provides clues about that answer but the actual value of the number must be learned empirically, that is, by analyzing data. Because we use data to answer quantitative questions, our answers always have some uncertainty: A different set of data would produce a different numerical answer. Therefore, the conceptual framework for the analysis needs to provide both a numerical answer to the question and a measure of how precise the answer is.

The conceptual framework used in this book is the multiple regression model, the mainstay of econometrics.

1.2 Causal Effects and Idealized Experiments

Questions in econometrics concern causal relationships among variables. Causality means that a specific action leads to a specific, measureable consequence.

Estimation of Causal Effects

One way to measure causal effect is to conduct an experiment: randomized controlled experment.

Forecasting and Causality

Another kind of questions econometrics concerned about are forecasting. Even though forecasting need not involve causal realtionships, economic theory suggests patterns and relationships that might be useful for forecasting.

1.3 Data: Sources and Types

In econometrics, data come from one of two sources: experiments or nonexperimental observations of the world.

Three main types of data: cross-sectional data, time series data, and panel data.

Experimental Versus Observational Data

Expriments: expensive, difficult to administer, sometimes unethical.

Observational data: difficult to sort out the effect of the “treatment” from other relecant factors.

Cross-Sectional Data

Data on different entities for a single time period are called cross-sectional data.

Time Series Data

Time series data are for a single entity collected at multiple time periods.

Panel Data

Panel data, also called longitudinal data, are data for multiple entities in which each entity is observed at two or more time periods.

Chapter 2 Review of Probability

Most aspects of the world around us have an element of randomness. The theory of probability provides mathematical tools for quantifying and describing this randomness.

2.1 Random Variables and Probability Distributions

The mutually exclusive potential results of a random process are called the outcomes.

The probability of an outcome is the proportion of the time that the outcome occurs in the long run.

The set of all possible outcomes is called the sample space.

An event is a subset of the sample space, that is, an event is a set of one or more outcomes.

A random variable is a numerical summary of a random outcome.

Probability Distribution of a Discrete Random Variable

The probability distribution of a discrete random variables is the list of all possible values of the variable and the probability that each value will occur.

cumulative probability distribution is also referred to as a cumulative distribution function (c.d.f.).

the outcomes are 0 or 1. A binary random variable is called a Bernoulli random variable, and its probability distribution is called the The Bernoulli distribution

Probability Distribution of a Continuous Random Variable

for coninuous variables, the probability is summarized by the probability density function. A probability density function is also called a p.d.f.

2.2 Expected Values, Mean, and Variance

The Expected Value of a Random Variable

The expected value of a random variable Y, denoted E(Y), is the long-run average value of the random variable over many repeated trials or occurrences.

The expected value of Y is also called the expectation of Y or the mean of Y.

The Standard Deviation and Variance

The variance and standard deviation measure the dispersion or the “spread” of a probability distribution.

var(Y) = E[(Y-u_Y)^2]

Mean and Variance of a Linear Function of a Random Variable

Other Measures of the Shape of a Distribution

the skewness measures the lack of symmetry of a distribution.

the kurtosis measures how thick, or “heavy” are its tails. An extreme value of Y is called an outlier. The greater the kurtosis of a distribution, the more likely are outliers. A distribution with kurtosis exceeding 3 is called leptokurtic or, more simply, heavy-tailed.

The mean, variance, skewness, and kurtosis are all based on what are called the moments of a distribution.

Moments. The mean of Y, E(Y), is also called the first moment of Y, and the expected value of the square of Y, E(Y^2), is called the second moment of Y. In general, the expected value of Y^T is called the T th moment of the random variable Y.

2.3 Two Random Variables

The joint probability distribution of two discrete random variables, say X and Y, is the probability that the random variables simultaneously take on certain values, say x and y.

The marginal probability distribution of a random variable Y is just another name for its probability distribution.

The distribution of a random variable Y conditional on another random variable X taking on a specific value is called the conditional distribution of Y given X.

The conditional expectation of Y given X, also called the conditonal mean of Y given X.

The law of iterated expectations: E(Y)=E[E(Y|X)]

Conditional variance. The variance of Y conditional on X is the variance of the conditional distribution of Y given X.

Two random variables X and Y are independently distributed, or independent, if knowing the value of one of the variables provides no information about the other. P(Y=y|X=x) = P(Y=y)

On measure of the extent to which two random variables move together is their covariance. The covariance between X and Y is the expected value cov(X, Y) = E[(X-u_X)(Y-u_Y)].

The correlation is an alternative measure of dependence between X and Y that solves the “units” problem of the covariance. corr(X, Y) = cov(X, Y) / (var(X)var(Y))^(1/2)

2.4 The Normal, Chi-Squared, Student t, and F Distributions

The Normal Distribution

the normal density with mean μ and variance δ^2 is symmetric around its mean and has 95% of its probability between μ-1.96σ and μ+1.96σ.

The standard normal distribution is the normal distribution with mean μ=0 and variance σ^2=1 and is denoted N(0,1). Random variables that have a N(0,1) distribution are often deoted Z, and the standard normal cumulative distribution function is denoted by the Greek letter φ; accordingly, P(Z <= c) = φ(c), where c is a constant.

To look up probabilities for a normal variable with a general mean and variance, we must standardize the variable by first subtracting the mean, then by dividing the result by the standard deviation, by computing Z = (Y - μ)/σ.

The normal distribution is symmetric, so its skewness is zero, The kurtosis of the normal distribution is 3.

The multivariate normal distribution

The normal distribution can be generalized to describe the joint distribution of a set of random variables. In this cae, the distribution is called the** multivariate normal distribution, or, if only two variables are being considered, the **bivariate normal distribution.

The multivariate normal distribution has four important properties.

  1. If X and Y have a bivariate normal distribution with covariance σ_XY and if a and b are two constants, then aX + bY has the normal distribution: aX + bY ~ N(aμ_X+bμ_Y, a^2σ_X^2+b^2σ_Y^2+2abσ_XY). More generally, if n random variables have a multivariate normal distribution, then any linear combination of these variables (such as their sum) is normally distributed.

  2. If a set of variables has a multivariate normal distribution, then the marginal distribution of each of the variables is normal.

  3. If variables with a multivariate normal distribution have covariances that equal zero. then the variables are independent. Thus, if X and Y have a bivariate normal distribution and σ_XY=0, then X and Y are independent. Zero covariance implies independence is a special property of the multivariate normal distribution that is not true in general.

  4. If X and Y have a bivariate normal distribution, then the conditional expectation of Y given X is linear in X; that is, E(Y|X=x)= a + bx, where a and b are constants. Joint normality implies linearity of conditional expectations, but linearity of conditional expectations does not imply joint normality.

The Chi-Squared Distribution

The chi-squared distribution is used when testing certain types of hypotheses in statistics and econometrics.

The chi-squared distribution is the distribution of the sum of m squared independent standard normal random variables.

This distribution depends on m, which is called the degrees of freedom of the chi-squared distribution. For example, let Z1, Z2, and Z3 be independent standard normal variables. Then Z1^2 + Z2^2 + Z3^2 has a chi-squared distribution with 3 degrees of freedom.

A chi-squared distribution with m degrees of freedom is denoted X_m^2

The Student t Distribution

The Student t distribution with m degrees of freedom is defined to be the distribution of the ratio of a standard normal random variable, divided by the square root of an independently distributed chi-squared random variable with m degrees of freedom divided by m. That is, let Z be a standard normal random variable, let W be a random variable with a chi-squared distribution with m degrees of freedom, and let Z and W be independently distributed. Then the random variable Z/(W/m)^1/2 has a Student t distribution (also called the t distribution) with m degrees of freedom. This distribution is denoted t_m.

The Student t distribution depends on the degrees of freedom m. Thus the 95th percentile of the t_m distribution depends on the degrees of freedom m. The Student t distribution has a bell shape similar to that of the normal distribution, but when m is small (20 or less), it has more mass in the tails - that is, it is a “fatter” bell shape than the normal. When m is 30 or more, the Student t distribution is well approximated by the standard normal distribution and the t_∞ distribution equals the standard normal distribution.

The F Distribution

The F distribution with m and n degrees of freedom, denoted F_m,n, is defined to be the distribution of the ratio of a chi-squared random variable with degrees of freedom m, to an independently distributed chi-squared random variable with degrees of freedom n, divided by n.

To state this mathematically, let W be a chi-suqared random variable with m degrees of freedom and let V be a chi-squared random variable with n degrees of freedom , where W and V are independently distributed. Then (W/m)/(V/n) has an F_m,n distribution.

In statistics and econometics, an important special case of the F distribution arises when the denominator degrees of freedom is large enough that the F_m,n distribution can be approximated by the F_m,∞ distribution. In this limiting case, the denominator random variable V/n is the mean of infinitely many squared standard normal random variables, and that mean is 1 because the mean of a squared standard normal random varables is 1. Thus the F_m,∞ distribution is the distribution of a chi-squared random variable with m degrees of freedom, divided by m; W/m is distributed F_m,∞.

2.5 Random Sampling and the Distribution of the Sample Average

Characterizing the distributions of sample averages therefore is an essential step toward understanding the performance of econometric procedures.

The act of random sampling - that is, randomly drawing a sample from a larger population - has the effect of making the sample average itself a random variable.

Because the sample average is a random variable, it has a probability distribution, which is called its sampling distribution.

Random Sampling

Simple random sampling: n objects are selected at random from a population and each member of the population is equally likely to be included in the sample.

The n observations in the sample are denoted Y1, …, Yn.

The act of random sampling means that Y1, …, Yn can be treated as random variables. Before they are sampled, Y1, …, Yn can take on many possible values; after they are sampled, a specific value is recorded for each observation.

When Y1, …, Yn are drawn from the same distribution and are independently distributed, they are said to be independently and identically distributed (or i.i.d.).

The Sampling Distribution of the Sample Average

The sample average or sample mean Y_bar of the n observations Y1, …, Yn is Y-bar=1/n(Y1+Y2+…+Yn)

An essential concept is that the act of drawing a random sample has the effect of making the sample average Y-bar a random variable. Because the sample was drawn at random, the value of each Yi is random. Because Y1, …, Yn are random, their average is random. Had a different sample been drawn, then the observations and their sample average would have been different; The value of Y-bar differs from one randomly drawn sample to the next.

Because Y-bar is random, it has a probability distribution. The distribution of Y-bar is called the sampling distribution of Y-bar because it is the probability distribution associated with possible values of Y-bar that could be computed for different possible samples Y1, …, Yn.

The sampling distribution of averages and weighted averages plays a central role in statistics and econometrics.

Mean and variance of Y-bar: Suppose that the observations Y1, …, Yn are i.i.d., and let μY and σY^2 denote the mean and variance of Yi. E(Y-bar)=μY. var(Y-bar) = (σY^2)/n.

These results hold whatever the distribution of Yi is; that is, the distribution of Yi does not need to take on a specific form, such as the normal distribution.

The notation σ_Y-bar^2 denotes the variance of the sampling distribution of the sample average Y-bar. In contrast, σ_Y^2 is the variance of each individual Yi, that is, the variance of the population distribution from which the observation is drawn.

Sampling distribution of Y-bar when Y is normally distributed.

2.6 Large-Sample Approximations to Sampling Distributions

There are two approaches to characterizing sampling distributions: an “exact” approach and an “approximate” approach.

The “exact” approach entails deriving a formula for the sampling distribution that holds exactly for any value of n.

The “approximate” uses approximations to the sampling distribution that rely on the sample size being large. The large-sample approximation to the sampling distribution is often called the asymptotic distribution - “asymptotic” because the approximations become exact in the limit that n -> ∞. As we see in this section, these approximations can be very accurate even if the sample size is only n=30 observations. Because sample sizes used in practice in econometrics typically number in the hundreds or thousands, these asymptotic distributions can be counted on to provide very good approximations to the exact sampling distribution.

This section presents the two key tools used to approximate sampling distributions when the sample size is large: the law of large numbers and the central limit theorem. The law of large numbers says that, when the sample size is large, Y-bar will be close to μY with very high probability. The central limit theorem says that, when the sample size is large, the sampling distribution of the standardized sample average, (Y-bar - μY)/σ_Y-bar, is approximately normal.

The Law of Large Numbers and Consistency

The law of large numbers states that, under general conditions, Y-bar will be near μY with very high probability when n is large. This is sometimes called the “law of averages.” When a large number of random variables with the same mean are averaged together, the large values balance the small values and their sample average is close to their common mean.

The property that Y-bar is near μY with increasing probability as n increases is called convergence in probability or, more concisely, consistency.

If the data are collected by simple random sampling, then the i.i.d. assumption holds. The assumption that the variance is finite says that extremely large value of Yi - that is, outliers - are unlikely and observed infrequently.

The Central Limit Theorem

The central limit theorem says that, under general conditions, the distribution of Y-bar is well approximated by a normal distribution when n is large.

One might ask, how large is “large enough”? That is, how large must n be for the distribution of Y-bar to be approximately normal? The answer is, “It depends.” The quality of the normal approximation depends on the distribution of the underlying Yi that make up the average. At one extreme, if the Yi are themselves normally distributed, then Y-bar is exactly normally distributed for all n. In contrast, when the underlying Yi thenselves have a distribution that is far from normal, then this approximation can require n=30 or even more.

Because the distribution of Y-bar approaches the normal as n grows large, Y-bar is said to have an asymptotic normal distribution.

Chapter 3 Review of Statistics

Statistics is the science of using data to learn about the world around us. Statistical tools help us answer questions about unknown characteristics of distributions in populations of interest.

The key insight of statistics is that one can learn about a population distribution by selecting a random sample from that population.

Three types of statistical methods are used throughout econometrics: estimation, hypothesis testing, and confidence intervals.

Estimation entails computing a “best guess” numerical value for an unknown characteristic of a population distribution, such as its mean, from a sample of data.

Hypothesis testing entails formulating a specific hypothesis about the population, then using sample evidence to decide whether it is true.

Confidence intervals use a set of data to estimate an interval or range for an unknown population characteristic.

3.1 Estimation of the Population Mean

Estimators and Their Properties

Estimators. The sample average Y-bar is a natural way to estiamte μY, but it is not the only way. For example, another way to estimate μY is simply to use the first observation, Y1. Both Y-bar and Y1 are functions of the data that are designed to estimate μY, both are estimators of μY. The estimators Y-bar and Y1 both have sampling distributions.

There are many possible estimators, so what makes one estimator “better” than another? Because estimators are random variables, this question can be phrased more precisely: What are desirable characteristics of the sampling distribution of an estimator? In general, we would like an estimator that gets as close as possible to the unknown true value, at least in some average sense; in other words, we would like the sampling distribution of an estimator to be as tightly centered on the unknown value as possible. This observation leads to three specific desirable characteristics of an estimator: unbiasedness (a lack of bias), consistency, and efficiency.

Unbiasedness. Suppose you evaluate an estimator many times over repeated randomly drawn samples. It is reasonable to hope that, on average, you would get the right answer. Thus a desirable property of an estimator is that the mean of its sampling ditribution equals μY; If so, the estimator is said to be unbiased. To state this concept mathematically, let μY-hat denote some estimator of μY, such as Y-bar or Y1. The estimator μY-hat is unbiased if E(μY-hat)=μY, where E(μY-hat) is the mean of the sampling distribution of μY-hat; otherwise, μY-hat is biased.

Consistency. When the sample size is large, the uncertainty about the value of μY arising from random variations in the sample is very small. Stated more precisely, a desirable property of μY-hat is that the probability that it is within a small interval of the ture value μY approaches 1 as the sample size increases, that is, μY-hat is consistent for μY.

Variance and efficiency. Suppose you have two candidate estimators, μY-hat1 and μY-hat2, both of which are unbiased. How might you choose between them? One way to do so is to choose the estimator with the tightest sampling distribution. This suggests choosing between μY-hat1 and μY-hat2 by picking the estimator with the smallest variance. If μY-hat1 has a smaller variance than μY-hat2, then μY-hat1 is said to be more efficient than μY-hat2. The terminology “efficiency” stems from the notion that if μY-hat1 has a smaller variance than μY-hat2, then it uses the information in the data more efficiently than does μY-hat2.

Properties of Y-bar

unbisas: E(Y-bar) = μY

consistency: the law of large numbers Y-bar -> μY

Efficiency: Y-bar is the Best Linear Unbiased Estimator (BLUE); that is Y-bar is the most efficient estimator of μY among all unbiased estimators that are weighted averages of Y1, …, Yn.

Y-bar is the least squares estimator of μY. The sample average Y-bar provides the best fit to the data in the sense that the average squared differences between the observations and Y-bar are the smallest of all possible estimators.

The estimator m that minimizes the sum of squared gaps Yi - m is called the least squares estimator.

The Importance of Random Sampling

We have assumed that Y1, …, Yn are i.i.d. draws, such as those that would be obtained from simple random sampling. This assumption is important because nonrandom sampling can result in Y-bar being biased. This bias ariases because this sampling scheme overrepresents, or oversamples the population.

It is important to design sample selection schemes in a way that minimizes bias.

3.2 Hypothesis Tests Concerning the Population Mean

Many hypotheses about the world around us can be phrased as yes/no questions. The statistical challenge is to answer these questions based on a sample of evidence. This section describes hypothesis tests concerning the population mean. Hypothesis tests involving two populations.

Null and Alternative Hypotheses

The starting point of statistical hypotheses testing is specifying the hypothesis to be tested, called the null hypothesis. Hypothesis testing entails using data to compare the null hypothesis to a second hypothesis, called the alternative hypothesis, that holds if the null does not.

The null hypothesis is that the population mean, E(Y), takes on a specific value, denoted μ_Y,0. H0: E(Y) = μ_Y,0. The two-sided alternative hypothesis: H1: E(Y) != μ_Y,0.

One-sided alternative are also possible, and these are discussed later in this section.

The problem facing the statistician is to use the evidence in a randomly selected sample of data to decide whether to accept the null hypothesis H0 or to reject it in favor of the alternative hypothesis H1. If the null hypothesis is “accepted,” this does not mean that the statistician declares it to be true; rather, it is accepted tentatively with the rocognition that it might be rejected later based on additional evidence. For this reason, statistical hypothesis testing can be posed as either rejecting the null hypothesis or failing to do so.

The p-Value

In any given sample, the sample average Y-bar will rarely be exactly equal to the hypothesized value μ_Y,0. Differences between Y-bar and μ_Y,0 can arise because the true mean in fact does not equal μ_Y,0 (the null hypothesis is false) or because the ture mean equals μ_Y,0 (the null hypothesis is true) but Y-bar differs from μ_Y,0 because of random sampling. It is impossible to distinguish between these two possibilities with certainty. Although a sample of data cannot provide conclusive evidence about the null hypothesis, it is possible to do a probabilistic calculation that permits testing the null hypotesis in a way that accounts for sampling uncertainty. This calculation involves using the data to compute the p-value of the null hypothesis.a

The p-value, also called the significance probability, is the probability of drawing a statistic at least as adverse to the null hypothesis as the one you actually computed in your sample, assuming the null hypothesis is correct. In the case at hand, the p-value is the probability of drawing Y-bar at least as far in the tails of its distribution under the null hypothesis as the sample average you actually computed.

If this p-value is small, say 0.5%, then it is very unlikely that this sample would have been drawn if the null hypothesis is true; thus it is reasonalbe to conclude that the null hypothesis is not true.

To state the definition of the p-value mathematically, let Y-bar_act denote the value of the sample average actually computed in the data set at hand and let Pr_H0 denote the probability computed under the null hypothesis (that is, computed assuming that E(Yi) = μ_Y,0).

p-value = Pr_H0[|Y-bar - μ_Y,0| > |Y-bar_act - μ_Y,0|]

To compute the p-value, it is necessary to know the sampling distribution of Y-bar under the null hypothesis. According to the central limit theorem, when the sample size is large, the sampling distribution of Y-bar is well approximated by a normal distribution.

This large-sample normal approximation makes it possible to compute the p-value without needing to know the population distribution of Y, as long as the sample size is large. The details of the calculation, however, depend on whether σY^2 is known.

Calculating the p-Value When σY Is Known

The p-value is the probability of obtaining a value of Y-bar farther from μY,0 than Y-bar_act under the null hypothesis or, equivalently, is the probability of obtaining (Y-bar - μY,0)/σY-bar greater than (Y-bar_act - μY,0)/σY-bar in absolute value.

The formula for the p-value depends on the variance of the population distribution, σY^2. In practice, this variance is typically unknown. Because in general σY^2 must be estimated before the p-value can be computed, we now turn to the problem of estimating σY^2.

The Sample Variance, Sample Standard Deviation, and Standard Error

The sample variance S_Y^2 is an **estimator of the population variance σY^2, the sample standard deviation S_Y is an estimator of the population standard deviation σY, and the standard error of the sample average Y-bar is an estimator of the standard deviation of the sampling distribution of Y-bar.

The sample variance and standard deviation. The sample variance, S_Y^2. The sample standard deviation S_Y.

The formula for the sample variance is much like the formula for the population variance. The population variace, E((Y-μY)^2), is the average value of (Y-μY)^2 in the population distribution. Similarly, the sample variance is the sample average of (Yi-μY)^2, i = 1, … ,n with two modifications: First, μY is replaced by Y-bar, and second, the average uses the divisor n-1 instead of n.

The reason for the first modification - replacing μY by Y-bar - is that μY is unknown and thus must be estimated; the natural estimator of μY is Y-bar. The reason for the second modification - dividing by n-1 instead of by n - is that estimating μY by Y-bar introduces a small downward bias in (Yi - Y-bar)^2.

Dividing by n - 1 instead of n is called a degrees of freedom correction: Estimating the mean uses up some of the information - that is, uses up 1 “degree of freedom” - in the data, so that only n - 1 degrees of freedom remain.

Consistency of the sample variance. The sample variance is a consistent estimator of the population variance: S_Y^2 -> σY^2 under the assumptions that Y1, …, Yn are i.i.d. and Yi has a finite fourth moment;

The standard error of Y-bar. SE(Y-bar) or (σY-bar)-hat = S_Y/(n)^(1/2)

Calculating the p-Value When σY Is Unknown replace σY-bar by the standard error SE(Y-bar).

The t-Statistic

The standardized sample average (Y-bar - μY,0) / SE(Y-bar) plays a central role in testing statistical hypotheses and has a special name, the t-statistic or t-ratio:

t = (Y-bar - μY,0) / SE(Y-bar)

Large-sample distribution of the t-statistic. When n is large, S_Y^2 is close to σY^2 with high probability. Thus the distribution of the t-statistic is approximately the same as the distribution of (Y-bar - μY,0) / σY-bar, which in turn is well approximated by the standard normal distribution when n is large because of the central limit theorem.

Accordingly, under the null hypothesis, t is approximately distributed N(0,1) for large n.

Hypothesis Testing with a Prespecified Significance Level

When you undertake a statistical hypothesis test, you can make two types of mistakes: You can incorrectly reject the null hypothesis when it is true (type I error), or you can fail to reject the null hypothesis when it is false (type II error).

Hypothesis tests can be performed without computing the p-value if you are willing to specify in advance the probability you are willing to tolerate of making the first kind of mistake - that is, of incorrectly rejecting the null hypothesis when it is true.

Hypothesis tests using a fixed significance level. Suppose it has been decided that the hypothesis will be rejected if the p-value is less than 5%. Because the area under the tails of the standard normal distribution outside +_ 1.96 is 5%, this gives a simple rule: Reject H0 if |t_act| > 1.96.

The probability that the test actually incorrectly rejects the null hypothesis when it is true is the size of the test. and the probability that the test correctly rejects the null hypothesis when the alternative is ture is the power of the test.

What significance level should you use in practice? 5%, or case by case.

Testing the Hypothesis E(Y) = μY,0 Against the Alternative E(Y) != μY,0

  1. Compute the standard error of Y-bar, SE(Y-bar)
  2. Compute the t-statistic
  3. Compute the p-value. Reject the hypothesis at the 5% significance level if the p-value is less than 0.05

One-Sided Alternatives

In some circumstances, the alternative hypothesis might be that the mean exceeds μY,0. This is called one-sided alternative hypothesis and can be written H1: E(Y) > μY,0

p-value = PrH0(Z > t_act) = 1 - φ(t_act)

3.3 Confidence Intervals for the Population Mean

It is possible to use data from a random sample to construct a set of values that contains the true population mean μY with certain prespecified probability. Such a set is called a confidence set, and the prespecified probability that μY is contained in this set is called the confidence level. The confidence set for μY turns out to be all the possible values of the mean between a lower and an upper limit, so that the confidence set is an interval, called a confidence interval.

A bit of clever reasoning shows that this set of values has a remarkable property: The probability that it contains the true value of the population mean is 95%.

Coverage probabilities. The coverage probability of a confidence interval for the population mean is the probability, computed over all possible random samples, that it contains the true population mean.

3.4 Comparing Means from Different Populations

Hypothesis Tests for the Difference Between Two Means

To illustrate a test for the difference between two means.

H0: μm - μw = d0 vs. H1: μm - μw != d0

t = (Y-bar_m - Y-bar_w) - d0 / SE(Y-bar_m - Y-bar_w)

Confidence Intervals for the Difference Between Two Population Means

3.5 Differences-of-Means Estimation of Causal Effects Using Experimental Data

The difference between the sample means of the treatment and control groups is an estimator of the causal effect of the treatment.

The Causal Effect as a Difference of Conditional Expectations

The causal effect on Y of treatment level x is the difference in the conditional expectations, E(Y | X = x) - E(Y | X = 0), where E(Y | X = x) is the expected value of Y for the treatment group (which reveives treatment level X = x) in an ideal randomized controlled experiment and E(Y | X = 0) is the expected value of Y for the control group (which receives treatment level X = 0). In the context of experiments, the causal effect is also called the treatment effect.

Estimation of the Causal Effect Using Differences of Means

The hypothesis that the treatment is ineffective is equivalent to the hypothesis that the two means are the same, which can be tested using the t-statistic for comparing two means.

Econometricians sometimes study “natural experiments”, also called quasi-experiments, in which some event unrelated to the treatment or subject characteristics has the effect of assigning different treatments to different subjects as if they had been part of a randomized controlled experiment.

3.6 Using the t-Statistic When the Sample Size Is Small

The use of the standard normal distribution is justified by the central limit theorem, which applies when the sample size is large. When the sample size is small, the standard normal distribution can provide a poor apporximation to the distribution of the t-statistic. If, however, the population distribution is itself normally distributed, then the exact distribution of the t-statistic testing the mean of a single population is the Student t distribution with n - 1 degrees of freedom, and critical values can be taken from the Student t distribution.

The t-Statistic and the Student t Distribution

under general conditions the t-statistic has a standard normal distribution if the sample size is large and the null hypothesis is true. Although the standard normal approximation to the t-statistic is reliable for a wide range of distributions of Y if n is large, it can be unreliable if n is small. The exact distribution of the t-statistic depends on the distribution of Y, and it can be very complicated. There is, however, one special case in which the exact distribution of the t-statistic is relatively simple: If Y is normally distributed, then the t-statistic has a Student t distribution with n-1 degrees of freedom.

The t-statistic testing differences of means. The t-statistic testing the difference of two means does not have a Student t distribution, even if the population distribution of Y is normal. (The Student t distribution does not apply here because the variance estimator used to compute the standard error does not produce a denominator in the t-statistic with a chi-squared distribution.)

A modified version of the differences-of-means t-statistic, based on a different standard error formula - the “pooled” standard error formula - has an exact Student t distribution when Y is normally distributed; However, the pooled standard error formula applies only in the special case that the two groups have the same variance or that each group has the same number of observations.

If the population distribution of Y in group m is N(μm, σm^2), if the population distribution of Y in group w is N(μw, σw^2), and if the two group variances are the same (that is, σm^2 = σw^2), then under the null hypothesis the t-statistic computed using the pooled standard error has a Student t distribution with nm + nw - 2 degrees of freedom.

The drawback of using the pooled variance estimator S_pooled^2 is that it applies only if the two population variances are the same. If the population variances are different, the pooled variance estimator is biased and inconsistent. Therefore, the pooled standard error and the pooled t-statistic should not be used unless you have a good reason to believe that the population variance are the same.

Use of the Student t Distribution in Practice

When comparing two means, any economic reason for two groups having different means typically implies that the two groups also could have different variances. Accordingly, the pooled standard error formula is inappropriate. We should consider large sample.

the sample szies are in the hundreds or thousands, large enough for the difference between the Student t distribution and the standard normal distribution to be negligible.

3.7 Scatterplots, the Sample Covariance, and the Sample Correlation

What is the relationship between two variables?

Scatterplots

A scatterplot is a plot of n observations on Xi and Yi, in which each observation is represented by the point (Xi, Yi).

Sample Covariance and Correlation

Because the population distribution is unknown, in pracitce we do not know the population covariance or correlation. It can be estimated by The sample covariance, and The sample correlation.

Consistency of the sample covariance and correlation. Like the sample variance, the sample covariance is consistent. S_XY -> σ_XY; r_XY -> corr(Xi, Yi)

The correlation coefficient is a measure of linear association.