Statistical methods are tools

that afford individuals the ability to answer questions about potential patterns

in empirical data. The history of regression analysis is traced back to Francis

Galton, whose study of the inheritance of height led to his insights on

regression analysis (Lohninger, H). To understand how height was transmitted

inter-generationally, Galton collected data on the heights of individuals and

their parents (Springerlink, 1997). He

constructed frequency tables that classified these people by their own height

and the average height of their parents.

He found that short people typically had short parents, while tall

individuals typically had tall parents. But, after scrutinizing the data

further, he also noticed that individuals who had tall parents were taller than

average, but typically not as tall as their parents, as was for short

individuals. The variance in the height

of the population was reduced by the “regression towards the mean.” Thus,

Galton developed a method of predicting the value of one quantitative variable,

using the values of another quantitative variable; that is, the heights of the

children were in some way, related to that of their parents. And secondarily, Galton developed a method

for assessing the regularity of the relationship.

Econometricians, researchers, and

scientists utilize regression analysis to make quantitative estimates of

economic relationships. In order for one

predict the direction of change, one needs a working knowledge of the of

economic theory and the general characteristics of the variables. However, to predict the amount of change, one

needs a sample of data and a means of estimating the relationship. Therefore, regression analysis is used to estimate

relationships. (Studenmund, 2010). Regressions can be applied in any field. Within economics, one may test for the impact

of income, number of children, health status, and a number of other factors on

family consumption. In politics, one

could measure the impact of public opinion and institutional variables on state

welfare spending (Berkeley, n.d.).

In

its simplest bivariate form, a regression will show the relationship between one

independent variable X, and dependent variable Y, written as: Y =B0 + B1X (Studenmund, 2010). The Betas

represent the coefficients that determine the position of the straight line at

any point. B0 is a constant

that determines the value of Y when X is equal to zero. B1 represents the slope

coefficient, or the change in Y that occurs as a result of X. It’s important to mention that a regression

result does not prove causality, but rather suggests that a certain

quantitative relationship exists and tests the strength and direction of that

relationship. It’s also important to remember

that correlation does not imply causation (Gallo et al., 2018). For example, if one tests the economic relationship

between health outcomes and health insurance, and there is a positive beta, one

should not immediately assume that those who have health insurance are all

healthier because of it. Factors such as

wealth are also positively correlated with health insurance and positive health

care outcomes. Therefore, theory and

common sense should motivate the creation of a regression with the right

variables. These variables can also take many forms such as categorical

variables, including nominal, ordinal, and dummy variables (Studenmund, 2010).

Econometricians also understand

that there is an unexplained variation or error that occurs due to a number of factors;

this includes: omitted variables, measurement error, incorrect functional form,

or random, unpredictable outcomes (Princeton, n.d.). Therefore, they assign the

stochastic error term in the regression to represent all variation in Y that

can’t be explain by X.

The OLS, ordinary least

squares, is a method for estimating the unknown parameters in a linear

regression (Studenmund, 2010). This

estimator was developed by Gauss- Markov. He applied this method in order to determine astronomical

observations and orbits of planets around the sun. In order

for Bestimate to be accurate, the estimator develops a series of

assumptions:

1.

Y is

a linear function of X

2.

All X’s are fixed in repeating samples

3.

The expected value of the error term is zero

4.

The variance of the error terms are constant

5.

There is no exact linear relationship between one independent variable

and others

If the five above conditions are

met, the Gauss-Markov Theorem indicates that OLS regression estimation bi

is the Best Linear Unbiased Estimator of ?i. These estimates should establish a normal

distribution. The statistical significance of estimates can be confirmed

through the use of t-tests, while overall models can be validated through

f-tests.