Guide: Hypothesis Testing

Author: Daniel Croft

Daniel Croft is an experienced continuous improvement manager with a Lean Six Sigma Black Belt and a Bachelor's degree in Business Management. With more than ten years of experience applying his skills across various industries, Daniel specializes in optimizing processes and improving efficiency. His approach combines practical experience with a deep understanding of business fundamentals to drive meaningful change.

Guide: Hypothesis Testing

In the world of data-driven decision-making, Hypothesis Testing stands as a cornerstone methodology. It serves as the statistical backbone for a multitude of sectors, from manufacturing and logistics to healthcare and finance. But what exactly is Hypothesis Testing, and why is it so indispensable? Simply put, it’s a technique that allows you to validate or invalidate claims about a population based on sample data. Whether you’re looking to streamline a manufacturing process, optimize logistics, or improve customer satisfaction, Hypothesis Testing offers a structured approach to reach conclusive, data-supported decisions.

The graphical example above provides a simplified snapshot of a hypothesis test. The bell curve represents a normal distribution, the green area is where you’d accept the null hypothesis (H0), and the red area is the “rejection zone,” where you’d favor the alternative hypothesis (Ha). The vertical blue line represents the threshold value or “critical value,” beyond which you’d reject H0.

Here’s a graphical example of a hypothesis test, which you can include in the introduction section of your guide. In this graph:

The curve represents a standard normal distribution, often encountered in hypothesis tests.
The green-shaded area signifies the “Acceptance Region,” where you would fail to reject the null hypothesis (H0).
The red-shaded areas are the “Rejection Regions,” where you would reject H0 in favor of the alternative hypothesis (Ha).

The blue dashed lines indicate the “Critical Values” (±1.96), which are the thresholds for rejecting H0.

This graphical representation serves as a conceptual foundation for understanding the mechanics of hypothesis testing. It visually illustrates what it means to accept or reject a hypothesis based on a predefined level of significance.

What is Hypothesis Testing?

Hypothesis testing is a structured procedure in statistics used for drawing conclusions about a larger population based on a subset of that population, known as a sample. The method is widely used across different industries and sectors for a variety of purposes. Below, we’ll dissect the key components of hypothesis testing to provide a more in-depth understanding.

The Hypotheses: H0 and Ha

In every hypothesis test, there are two competing statements:

Null Hypothesis (H0): This is the “status quo” hypothesis that you are trying to test against. It is a statement that asserts that there is no effect or difference. For example, in a manufacturing setting, the null hypothesis might state that a new production process does not improve the average output quality.
Alternative Hypothesis (Ha or H1): This is what you aim to prove by conducting the hypothesis test. It is the statement that there is an effect or difference. Using the same manufacturing example, the alternative hypothesis might state that the new process does improve the average output quality.

Significance Level (α)

Before conducting the test, you decide on a “Significance Level” (α), typically set at 0.05 or 5%. This level represents the probability of rejecting the null hypothesis when it is actually true. Lower α values make the test more stringent, reducing the chances of a ‘false positive’.

Data Collection

You then proceed to gather data, which is usually a sample from a larger population. The quality of your test heavily relies on how well this sample represents the population. The data can be collected through various means such as surveys, observations, or experiments.

Statistical Test

Depending on the nature of the data and what you’re trying to prove, different statistical tests can be applied (e.g., t-test, chi-square test, ANOVA, etc.). These tests will compute a test statistic (e.g., t, 2χ2, F, etc.) based on your sample data.

Here are graphical examples of the distributions commonly used in three different types of statistical tests: t-test, Chi-square test, and ANOVA (Analysis of Variance), displayed side by side for comparison.

T-Test

Graph 1 (Leftmost): This graph represents a t-distribution, often used in t-tests. The t-distribution is similar to the normal distribution but tends to have heavier tails. It is commonly used when the sample size is small or the population variance is unknown.

Chi-square Test

Graph 2 (Middle): The Chi-square distribution is used in Chi-square tests, often for testing independence or goodness-of-fit. Unlike the t-distribution, the Chi-square distribution is not symmetrical and only takes on positive values.

ANOVA (F-distribution)

Graph 3 (Rightmost): The F-distribution is used in Analysis of Variance (ANOVA), a statistical test used to analyze the differences between group means. Like the Chi-square distribution, the F-distribution is also not symmetrical and takes only positive values.

These visual representations provide an intuitive understanding of the different statistical tests and their underlying distributions. Knowing which test to use and when is crucial for conducting accurate and meaningful hypothesis tests.

Decision Making

The test statistic is then compared to a critical value determined by the significance level (α) and the sample size. This comparison will give you a p-value. If the p-value is less than α, you reject the null hypothesis in favor of the alternative hypothesis. Otherwise, you fail to reject the null hypothesis.

Interpretation

Finally, you interpret the results in the context of what you were investigating. Rejecting the null hypothesis might mean implementing a new process or strategy, while failing to reject it might lead to a continuation of current practices.

To sum it up, hypothesis testing is not just a set of formulas but a methodical approach to problem-solving and decision-making based on data. It’s a crucial tool for anyone interested in deriving meaningful insights from data to make informed decisions.

Why is Hypothesis Testing Important?

Hypothesis testing is a cornerstone of statistical and empirical research, serving multiple functions in various fields. Let’s delve into each of the key areas where hypothesis testing holds significant importance:

Data-Driven Decisions

In today’s complex business environment, making decisions based on gut feeling or intuition is not enough; you need data to back up your choices. Hypothesis testing serves as a rigorous methodology for making decisions based on data. By setting up a null hypothesis and an alternative hypothesis, you can use statistical methods to determine which is more likely to be true given a data sample. This structured approach eliminates guesswork and adds empirical weight to your decisions, thereby increasing their credibility and effectiveness.

Risk Management

Hypothesis testing allows you to assign a ‘p-value’ to your findings, which is essentially the probability of observing the given sample data if the null hypothesis is true. This p-value can be directly used to quantify risk. For instance, a p-value of 0.05 implies there’s a 5% risk of rejecting the null hypothesis when it’s actually true. This is invaluable in scenarios like product launches or changes in operational processes, where understanding the risk involved can be as crucial as the decision itself.

Here’s an example to help you understand the concept better.

The graph above serves as a graphical representation to help explain the concept of a ‘p-value’ and its role in quantifying risk in hypothesis testing. Here’s how to interpret the graph:

Elements of the Graph

The curve represents a Standard Normal Distribution, which is often used to represent z-scores in hypothesis testing.
The red-shaded area on the right represents the Rejection Region. It corresponds to a 5% risk (α=0.05) of rejecting the null hypothesis when it is actually true. This is the area where, if your test statistic falls, you would reject the null hypothesis.

The green-shaded area represents the Acceptance Region, with a 95% level of confidence. If your test statistic falls in this region, you would fail to reject the null hypothesis.
The blue dashed line is the Critical Value (approximately 1.645 in this example). If your standardized test statistic (z-value) exceeds this point, you enter the rejection region, and your p-value becomes less than 0.05, leading you to reject the null hypothesis.

Relating to Risk Management

The p-value can be directly related to risk management. For example, if you’re considering implementing a new manufacturing process, the p-value quantifies the risk of that decision. A low p-value (less than α) would mean that the risk of rejecting the null hypothesis (i.e., going ahead with the new process) when it’s actually true is low, thus indicating a lower risk in implementing the change.

Quality Control

In sectors like manufacturing, automotive, and logistics, maintaining a high level of quality is not just an option but a necessity. Hypothesis testing is often employed in quality assurance and control processes to test whether a certain process or product conforms to standards. For example, if a car manufacturing line claims its error rate is below 5%, hypothesis testing can confirm or disprove this claim based on a sample of products. This ensures that quality is not compromised and that stakeholders can trust the end product.

Resource Optimization

Resource allocation is a significant challenge for any organization. Hypothesis testing can be a valuable tool in determining where resources will be most effectively utilized. For instance, in a manufacturing setting, you might want to test whether a new piece of machinery significantly increases production speed. A hypothesis test could provide the statistical evidence needed to decide whether investing in more of such machinery would be a wise use of resources.

Innovation

In the realm of research and development, hypothesis testing can be a game-changer. When developing a new product or process, you’ll likely have various theories or hypotheses. Hypothesis testing allows you to systematically test these, filtering out the less likely options and focusing on the most promising ones. This not only speeds up the innovation process but also makes it more cost-effective by reducing the likelihood of investing in ideas that are statistically unlikely to be successful.

In summary, hypothesis testing is a versatile tool that adds rigor, reduces risk, and enhances the decision-making and innovation processes across various sectors and functions.

This graphical representation makes it easier to grasp how the p-value is used to quantify the risk involved in making a decision based on a hypothesis test.

Step-by-Step Guide to Hypothesis Testing

To make this guide practical and helpful if you are new learning about the concept we will explain each step of the process and follow it up with an example of the method being applied to a manufacturing line, and you want to test if a new process reduces the average time it takes to assemble a product.

Step 1: State the Hypotheses

The first and foremost step in hypothesis testing is to clearly define your hypotheses. This sets the stage for your entire test and guides the subsequent steps, from data collection to decision-making. At this stage, you formulate two competing hypotheses:

Null Hypothesis (H0)

The null hypothesis is a statement that there is no effect or no difference, and it serves as the hypothesis that you are trying to test against. It’s the default assumption that any kind of effect or difference you suspect is not real, and is due to chance. Formulating a clear null hypothesis is crucial, as your statistical tests will be aimed at challenging this hypothesis.

Example:

In a manufacturing context, if you’re testing whether a new assembly line process has reduced the time it takes to produce an item, your null hypothesis (H0) could be:

H0:”The new process does not reduce the average assembly time.”

Alternative Hypothesis (Ha or H1)

The alternative hypothesis is what you want to prove. It is a statement that there is an effect or difference. This hypothesis is considered only after you find enough evidence against the null hypothesis.

Example:

Continuing with the manufacturing example, the alternative hypothesis (Ha) could be:

Ha:”The new process reduces the average assembly time.”

Types of Alternative Hypothesis

Depending on what exactly you are trying to prove, the alternative hypothesis can be:

Two-Sided: You’re interested in deviations in either direction (greater or smaller).
One-Sided: You’re interested in deviations only in one direction (either greater or smaller).

Scenario: Reducing Assembly Time in a Car Manufacturing Plant

You are a continuous improvement manager at a car manufacturing plant. One of the assembly lines has been struggling with longer assembly times, affecting the overall production schedule. A new assembly process has been proposed, promising to reduce the assembly time per car. Before rolling it out on the entire line, you decide to conduct a hypothesis test to see if the new process actually makes a difference.

Null Hypothesis (H0)
In this context, the null hypothesis would be the status quo, asserting that the new assembly process doesn’t reduce the assembly time per car. Mathematically, you could state it as:
H0:The average assembly time per car with the new process ≥ The average assembly time per car with the old process.
Or simply:
H0:”The new process does not reduce the average assembly time per car.”
Alternative Hypothesis (Ha or H1)
The alternative hypothesis is what you aim to prove — that the new process is more efficient. Mathematically, it could be stated as:
Ha:The average assembly time per car with the new process < The average assembly time per car with the old process
Or simply:
Ha:”The new process reduces the average assembly time per car.”
Types of Alternative Hypothesis
In this example, you’re only interested in knowing if the new process reduces the time, making it a One-Sided Alternative Hypothesis.

Step 2: Determine the Significance Level (α)

Once you’ve clearly stated your null and alternative hypotheses, the next step is to decide on the significance level, often denoted by α. The significance level is a threshold below which the null hypothesis will be rejected. It quantifies the level of risk you’re willing to accept when making a decision based on the hypothesis test.

What is a Significance Level?

The significance level, usually expressed as a percentage, represents the probability of rejecting the null hypothesis when it is actually true. Common choices for α are 0.05, 0.01, and 0.10, representing 5%, 1%, and 10% levels of significance, respectively.

5% Significance Level (α=0.05): This is the most commonly used level and implies that you are willing to accept a 5% chance of rejecting the null hypothesis when it is true.
1% Significance Level (α=0.01): This is a more stringent level, used when you want to be more sure of your decision. The risk of falsely rejecting the null hypothesis is reduced to 1%.

10% Significance Level (α=0.10): This is a more lenient level, used when you are willing to take a higher risk. Here, the chance of falsely rejecting the null hypothesis is 10%.

Example:

Continuing with the manufacturing example, let’s say you decide to set α=0.05, meaning you’re willing to take a 5% risk of concluding that the new process is effective when it might not be.

How to Choose the Right Significance Level?

Choosing the right significance level depends on the context and the consequences of making a wrong decision. Here are some factors to consider:

Criticality of Decision: For highly critical decisions with severe consequences if wrong, a lower α like 0.01 may be appropriate.
Resource Constraints: If the cost of collecting more data is high, you may choose a higher α to make a decision based on a smaller sample size.

Industry Standards: Sometimes, the choice of α may be dictated by industry norms or regulatory guidelines.

By the end of Step 2, you should have a well-defined significance level that will guide the rest of your hypothesis testing process. This level serves as the cut-off for determining whether the observed effect or difference in your sample is statistically significant or not.

Continuing the Scenario: Reducing Assembly Time in a Car Manufacturing Plant

After formulating the hypotheses, the next step is to set the significance level ( $α$ ) that will be used to interpret the results of the hypothesis test. This is a critical decision as it quantifies the level of risk you’re willing to accept when making a conclusion based on the test.
Setting the Significance Level
Given that assembly time is a critical factor affecting the production schedule, and ultimately, the company’s bottom line, you decide to be fairly stringent in your test. You opt for a commonly used significance level:
$α = 0.05$
This means you are willing to accept a 5% chance of rejecting the null hypothesis when it is actually true. In practical terms, if you find that the p-value of the test is less than 0.05, you will conclude that the new process significantly reduces assembly time and consider implementing it across the entire line.
Why $α = 0.05$ ?
Industry Standard: A 5% significance level is widely accepted in many industries, including manufacturing, for hypothesis testing.
Risk Management: By setting $α = 0.05$ , you’re limiting the risk of concluding that the new process is effective when it may not be to just 5%.
Balanced Approach: This level offers a balance between being too lenient (e.g., α=0.10) and too stringent (e.g., α=0.01), making it a reasonable choice for this scenario.

Step 3: Collect and Prepare the Data

After stating your hypotheses and setting the significance level, the next vital step is data collection. The data you collect serves as the basis for your hypothesis test, so it’s essential to gather accurate and relevant data.

Types of Data

Depending on your hypothesis, you’ll need to collect either:

Quantitative Data: Numerical data that can be measured. Examples include height, weight, and temperature.

Qualitative Data: Categorical data that represent characteristics. Examples include colors, gender, and material types.

Data Collection Methods

Various methods can be used to collect data, such as:

Surveys and Questionnaires: Useful for collecting qualitative data and opinions.

Observation: Collecting data through direct or participant observation.
Experiments: Especially useful in scientific research where control over variables is possible.
Existing Data: Utilizing databases, records, or any other data previously collected.

Sample Size

The sample size (n) is another crucial factor. A larger sample size generally gives more accurate results, but it’s often constrained by resources like time and money. The choice of sample size might also depend on the statistical test you plan to use.

Example:

Continuing with the manufacturing example, suppose you decide to collect data on the assembly time of 30 randomly chosen products, 15 made using the old process and 15 made using the new process. Here, your sample size n=30.

Data Preparation

Once data is collected, it often needs to be cleaned and prepared for analysis. This could involve:

Removing Outliers: Outliers can skew the results and provide an inaccurate picture.
Data Transformation: Converting data into a format suitable for statistical analysis.

Data Coding: Categorizing or labeling data, necessary for qualitative data.

By the end of Step 3, you should have a dataset that is ready for statistical analysis. This dataset should be representative of the population you’re interested in and prepared in a way that makes it suitable for hypothesis testing.

Continuing the Scenario: Reducing Assembly Time in a Car Manufacturing Plant

With the hypotheses stated and the significance level set, you’re now ready to collect the data that will serve as the foundation for your hypothesis test. Given that you’re testing a change in a manufacturing process, the data will most likely be quantitative, representing the assembly time of cars produced on the line.
Data Collection Plan
You decide to use a Random Sampling Method for your data collection. For two weeks, assembly times for randomly selected cars will be recorded: one week using the old process and another week using the new process. Your aim is to collect data for 40 cars from each process, giving you a sample size (n) of 80 cars in total.
Types of Data
Quantitative Data: In this case, you’re collecting numerical data representing the assembly time in minutes for each car.
Data Preparation
Data Cleaning: Once the data is collected, you’ll need to inspect it for any anomalies or outliers that could skew your results. For example, if a significant machine breakdown happened during one of the weeks, you may need to adjust your data or collect more.
Data Transformation: Given that you’re dealing with time, you may not need to transform your data, but it’s something to consider, depending on the statistical test you plan to use.
Data Coding: Since you’re dealing with quantitative data in this scenario, coding is likely unnecessary unless you’re planning to categorize assembly times into bins (e.g., ‘fast’, ‘medium’, ‘slow’) for some reason.
Example Data Points:
Car_ID Process_Type Assembly_Time_Minutes
1 Old 38.53
2 Old 35.80
3 Old 36.96
4 Old 39.48
5 Old 38.74
6 Old 33.05
7 Old 36.90
8 Old 34.70
9 Old 34.79
… … …
The complete dataset would contain 80 rows: 40 for the old process and 40 for the new process.

Car_ID	Process_Type	Assembly_Time_Minutes
1	Old	38.53
2	Old	35.80
3	Old	36.96
4	Old	39.48
5	Old	38.74
6	Old	33.05
7	Old	36.90
8	Old	34.70
9	Old	34.79
…	…	…

Step 4: Conduct the Statistical Test

After you have your hypotheses, significance level, and collected data, the next step is to actually perform the statistical test. This step involves calculations that will lead to a test statistic, which you’ll then use to make your decision regarding the null hypothesis.

Choose the Right Test

The first task is to decide which statistical test to use. The choice depends on several factors:

Type of Data: Quantitative or Qualitative

Sample Size: Large or Small
Number of Groups or Categories: One-sample, Two-sample, or Multiple groups

For instance, you might choose a t-test for comparing means of two groups when you have a small sample size. Chi-square tests are often used for categorical data, and ANOVA is used for comparing means across more than two groups.

Calculation of Test Statistic

Once you’ve chosen the appropriate statistical test, the next step is to calculate the test statistic. This involves using the sample data in a specific formula for the chosen test.

Example:

In our manufacturing example, let’s say you’ve chosen to use a t-test to compare the average assembly time between the old and new processes. You would calculate the t-statistic using the formula: formular

Obtain the p-value

After calculating the test statistic, the next step is to find the p-value associated with it. The p-value represents the probability of observing the given test statistic if the null hypothesis is true.

A small p-value (<α) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
A large p-value (>α) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.

Make the Decision

You now compare the p-value to the predetermined significance level (α):

If p<α, you reject the null hypothesis in favor of the alternative hypothesis.
If p>α, you fail to reject the null hypothesis.

Example:

In the manufacturing case, if your calculated p-value is 0.03 and your α is 0.05, you would reject the null hypothesis, concluding that the new process effectively reduces the average assembly time.

By the end of Step 4, you will have either rejected or failed to reject the null hypothesis, providing a statistical basis for your decision-making process.

Continuing the Scenario: Reducing Assembly Time in a Car Manufacturing Plant

Now that you have collected and prepared your data, the next step is to conduct the actual statistical test to evaluate the null and alternative hypotheses. In this case, you’ll be comparing the mean assembly times between cars produced using the old and new processes to determine if the new process is statistically significantly faster.
Choosing the Right Test
Given that you have two sets of independent samples (old process and new process), a Two-sample t-test for Equality of Means seems appropriate for comparing the average assembly times.
Preparing Data for Minitab
Firstly, you would prepare your data in an Excel sheet or CSV file with one column for the assembly times using the old process and another column for the assembly times using the new process. Import this file into Minitab.
Steps to Perform the Two-sample t-test in Minitab
Open Minitab: Launch the Minitab software on your computer.
Import Data: Navigate to File > Open and import your data file.
Navigate to the t-test Menu: Go to Stat > Basic Statistics > 2-Sample t....
Select Columns: In the dialog box, specify the columns corresponding to the old and new process assembly times under “Sample 1” and “Sample 2.”
Options: Click on Options and make sure that you set the confidence level to 95% (which corresponds to $α = 0.05$ ).
Run the Test: Click OK to run the test.
In this example output, the p-value is 0.0012, which is less than the significance level
$α = 0.05$ . Hence, you would reject the null hypothesis.
The t-statistic is -3.45, indicating that the mean of the new process is statistically significantly less than the mean of the old process, which aligns with your alternative hypothesis.
Showing the data displayed as a Box plot in the below graphic it is easy to see the new process is statistically significantly better.

Why do a Hypothesis test?

You might ask, after all this why do a hypothesis test and not just look at the averages, which is a good question. While looking at average times might give you a general idea of which process is faster, hypothesis testing provides several advantages that a simple comparison of averages doesn’t offer: