Hypothesis Testing: Two Samples
Sampling Distribution of the difference between sample means:
By randomly drawing individual samples from a population and calculating their means, we were able to construct a frequency distribution, which we call the sampling distribution of the means.
There are several important properties of the sampling distribution of the means:
If the samples are of sufficient size and randomly chosen, the mean of all the samples will equal the mean of the population.
If the shape of the sampling distribution of scores in the population is unknown, as sample size increases the sampling distribution of means will tend toward normality and have a mean that equals the mean of the population.
The larger the size of each sample selected from the population, the smaller the standard deviation of the sampling distribution of means.
Since we can't determine the S.D. of the population, we estimate the standard error of the mean, which is symbolized by
an is determined by the formula

When we draw pairs of independent samples from the population, we construct a frequency distribution that we call the sampling distribution of the differences.
Independent Samples:
~Samples drawn according to random selection procedures so that the choice of one observation for a sample does not affect the probability of another observation being chosen for a different sample.
~Samples in which the behavior of the members of one sample is not related to the behavior of the members of another sample.
Independence is assumed if samples are randomly drawn from the population, and
Subjects are randomly assigned to a treatment condition.
Several properties characterize the sampling distribution of the means:
The distribution of the differences is always zero
The larger the sample size, the closer the distribution approximates the normal curve.
The larger the sample sizes, the smaller the estimated standard error of the difference (between the means).
The estimated standard error of the difference is symbolized by the expression

We know we can compute t-scores (derived from the logic of z-scores) to evaluate these two means. This is the basis of the mean difference hypothesis test, and provides a statistical test of a hypothesis about the difference between two means.
We can calculate the t-scores using the formula:

The definitional formula for independent t-tests:

Assumptions behind the use of the t-test:
In order for the t-test to be valid in the determination of significance of the difference between two means, the following assumptions must be met:
The scores for each population represented in the study should be normally distributed.
The scores for the dependent variable should be interval or ratio level.
The population variances must not be significantly different.
*This assumption requires that the variance within each group not be significantly different, therefore that the variances in the populations represented not be different.
*This assumption is important since the denominator of the t-test combines variances of each of the two groups to arrive at a single estimate of the sampling error.
Type I and Type II Errors:
As we undertake research, we are confronted with the fact that, based on the data, there is always the possibility of making the wrong decision about the significance of the data.
There are two types of error that we can make:
1. Type I or alpha errors.
These are errors that occur when we reject the null hypothesis when the null hypothesis is true. This is the same thing as chance or luck producing the results we find, even though the probability is low. Its probability is alpha.
Type II or beta errors.
These are the errors that occur when we fail to reject the null hypothesis when the null hypothesis is incorrect. In other words, there are significant differences between the sample mean and the population mean and our test did not detect those differences. Its probability is beta.
Decision
| Null Hypothesis | Reject |
Fail to Reject |
| True | Type I or Alpha error |
Correct |
| False | Correct |
Type II or Beta error |
Based on the example of the test of the effectiveness of assertiveness training, we can consider:
| Reject Null | Accept Null | |
| True Null | Type I Error | Assertiveness training made no difference. |
| False Null | Assertiveness training made a difference. | Type II Error |
The larger value we select for alpha, the more likely we are to reject the null. If we can't accept the null, we can't make a type II error.
The smaller value we select for alpha, the more likely we are to accept the null. In accepting the null, we risk making a type II error.
There are costs associated with making either type error!
If we make a type I error (in the example), we would spend scarce resources for an ineffective intervention.
If we make a type II error (in the example), clients would forgo the benefit of a needed and effective intervention.
Power:
Statistical power analysis deals with the probability of avoiding type II errors. In other words, to assesses the probability of correctly rejecting a null hypothesis that is false.
Power is the probability that a statistical test will detect a false null hypothesis, or detect a true difference when one is present.
Power is defined as 1 - beta.
There are various techniques that can be used to increase Power:
Increase the sample size. Look at the table for the critical values for t-scores. The larger the value of df (sample size), the smaller the critical value required to reject the null hypothesis.
Increase the alpha error. A test's power is increased by increasing the amount of alpha error (once again, a glance at the t-test table will confirm this.) Convention states that alpha level is set at p < .05; however, there is nothing "sacred" about this level. You will need to distinguish between statistical significance versus substantive significance in relationships between social variables.
Use all the information the data provides. If you have interval or ratio level data, use the statistical tests designed to measure them. Interval and ratio levels of data provide more information than ordinal or nominal levels of data, so more information results in a more sensitive and exact measurement capability.
Use a one-tailed versus a two-tailed test (where appropriate). When you know the direction of the difference in a relationship between variables, you can use a one-tailed test. One-tailed tests are generally stronger than two-tailed tests.
Reliability of the instrument. The more reliable the measures being analyzed, the greater the likelihood that a true difference will be determined to be significant. Reliable scores are dependable, consistent, and not overly prone to measurement error.
Hypothesis Testing: Two Samples
Dependent Samples:
A dependent sample occurs when the data from two samples drawn from a normal population are paired or matched. That is to say that the scores in one sample are effected by the scores on another.
Examples of dependent samples could include:
Within-subject comparisons - the same individual receives both treatment conditions (more common in medical science)
Repeated measures - where a subject's pre-test scores are matched with post-test scores, or a subject's score at time 1 is matched with scores at time 2, time 3, etc.
Due to the effects of measurement, each prior testing will effect the scores on the measurement.
This results in the data being correlated.
Since data are correlated, the standard deviation of the distribution,
is smaller than with independent samples, and is estimated by the standard error of the mean differences
The estimated standard error of the mean differences is determined by the formula

This formula can be used if you know the correlation coefficient (r) between measurement periods.
Formula for t-test with related samples:

This can be a relatively difficult formula to use, so a more useful method has been developed, termed the direct difference method.
Direct Difference Method:
This method involves comparing mean difference of scores measured at two different times using same subjects (paired).
This produces two sets of scores.
The second score is subtracted from the first score in the pair and this is called the difference (D).
All further calculations are based on the differences between each pair of scores, rather than the scores themselves, and is based on the following computational formula:

where
If we substitute
for
we obtain the computational formula for t-test for dependent samples:

where the mean difference is computed by the formula

and

Example:
Data is randomly collected from nine graduate students in an attitude survey about taking a statistics class, both before they've taken the statistics class and at the end of the semester they take the class. Since each student's post-test score could, at least in part, be determined by his/her pretest score, regardless of the class, the dependent t-test formula is chosen to test the null hypothesis.
Since it's possible the class could be detrimental to their attitude about statistics, as well as enhancing their attitude, a two-tailed test is chosen.
The hypothesis we are interested in are:
![]()
In the first case, the class would have had a strong positive influence on attitudes about statistics to be true. In the second, the class would have a negative influence on the subjects to be true. In other words, higher scores on the survey correspond with better attitudes about statistics.
Step 1. The null hypothesis is stated and tested as

Step 2. Extract the necessary data.
Pre and Post Class Attitudes
Pretest |
Post-test |
D |
D square |
24 |
18 |
-6 |
36 |
10 |
10 |
0 |
0 |
12 |
8 |
-4 |
16 |
18 |
16 |
-2 |
4 |
11 |
7 |
-4 |
16 |
15 |
11 |
-4 |
16 |
8 |
4 |
-4 |
16 |
22 |
20 |
-2 |
4 |
18 |
16 |
-2 |
4 |
| X1=5.33 | X2=12.22 | S D=-28 | S (D)sqr =112 |
Step 3. Write out the formulae you will use.



Step 4. Follow the formulae exactly as written, placing the appropriate values into each formula and solve.










![]()
Step 5. Determine the appropriate degrees of freedom. In this case, there are nine pairs of data; df = N - 1, so there are 9 - 1 df, or 8 df.
Step 6. Look at the table in the text to determine the critical value with alpha = .05 and df = 8.
The critical value is 2.3060
Step 6. Make a decision.
Based on the fact that

the observed mean difference is not due to sampling error (chance); there is a difference between populations. Since the value is negative (ONLY in our HYPOTHETICAL EXAMPLE, of course), we would conclude that the class had a negative impact on the subjects attitudes.
The assumptions for the use of dependent sample t-test are the same as those used for the independent sample t-test.
Return to Lee's Home Page