# Guide: Histogram

A histogram is similar to a bar chart, but it is a precise tool for showing the frequency of data across intervals. Picture a set of data points, such as the times it takes to complete a manufacturing process. These points are segmented into “bins,” each an equal width, and tallied to reflect how often values land within these slices of the data spectrum. This graphical representation is important in Lean Six Sigma, where understanding the spread and shape of data distribution is key. Whether data is tightly clustered or broadly spread, histograms transform numbers into visual stories, revealing process variations and guiding continuous improvement.

## Table of Contents

## What is a Histogram?

If you look at a Histogram, you might think it looks like a bar chart, it is in fact a type of bar chart. The Histogram represents the frequency of numerical data distribution. To create a histogram the range of data is divided into intervals, and then the frequency of the data points within each interval is tallied. These intervals are known as “bin”, and they are usually equally sized in terms of width with a high dependence on frequency.

### Key Components of Histograms

**Bins:**These are the defined intervals that cover a range of data; for example above, you have 1-2 being a bin, then 2-3 as another bin. So where the data point value is between 1 and 2, it would fall within the first bin frequency.**Frequency:**This is the count of data points within each bin. The frequency can either be absolute or relative (a percentage of the total number of data points).

## Importance in Lean Six Sigma

In Lean Six Sigma, histograms are a useful data analysis tool that is used to understand data in terms of frequency and distribution. Histograms are used in Lean Six Sigma projects for:

**Process understanding:**Histograms are able to provide a visualization of data to show how much variation there is in a process. This can be done by examining the spread and shape of the distribution. By using a Histogram you can understand if there is too much variation as an output of your process and see to what degree and in which direction it needs to be shifted or reduced.

**Data Analysis:**Reviewing raw data it can be difficult to come to any conclusions as raw data can be large and difficult to read. By using a histogram, it can reveal patterns that may not be evident. A histogram could be normally distributed, skewed positive or negativley, or could have a bimodal distribution, which might suggest that two different processes or groups have been merged.

**Normal Distribution**

**Right Skewed Distribution**

**Left Skewed Distribution**

**Biomodal Histogram**

**Continuous Improvement:**By identifying and understanding the distribution of data, histograms can show areas of a process that need to be improved. For example, if the histogram shows that a significant number of outputs are outside customer specifications, the process may need to be centred or variation reduced.

## Creating a Histogram

Creating histograms these days is relatively simple with software such as Excel. We also have a free histogram download template that you can paste data into for results.

We have also developed a web-based tool that will allow you to upload or paste a data file, which will create a Histogram and analyze the data in it for you. Feel free to try out the Histogram Analyzer tool.

Aside from those two options, follow our guide below on creating your histogram:

### Step 1: Data Collection

Before you can visualize your data, you need to collect it if you have not already.

You will do this by **identifying the data source**, it could be process measurements, time studies, or customer satisfaction surveys. The data you will need for this is numerical can can be on a continuous scale so does not need to be whole numbers.

Once you know what data to collect, you need to collect it and ensure it is **structured** in the right format.

After data collection, it is important to **clean the data** and by this, we mean to ensure no outliers or errors in the data are removed as they could skew the results.

### Step 2: Data Arrangement

Following the collection and cleaning of data the next step would be to **sort **the data, which means you need to arrange the data from smallest to largest values.

Next in Excel, select all of your data, click **Insert **> **Recommended Charts** > **All Charts** > **Histogram** > **Ok**

You will then have your Histogram.

### Step 3: Interpreting the Histogram

When reviewing a histogram, you’re analyzing the data it represents. Here’s how to analyze a histogram and what to look for:

#### Symmetry

**Balanced Process**: A symmetric histogram, where data is evenly spread around a central value, suggests that the process is consistent and predictable.**Example**: If you’re measuring the weight of packaged products and the histogram is symmetric around the target weight, this indicates that your packaging process is accurate on average.

#### Skewness

**Process Bias**: Skewness in a histogram indicates that the data is not evenly distributed around the central value.**Right-Skewed**: More data is concentrated on the left side, suggesting frequent low-value occurrences and some high-value outliers. In terms of process, this could mean that while most operations are fast, a few take much longer.

**Left-Skewed**: More data is concentrated on the right, indicating that high values are more common and low values are outliers. For product quality, this might imply that most products are over the desired specification limit, with only a few meeting the target.

#### Outliers

**Isolated Bars**: Outliers appear as bars that are separate from the main body of the histogram. They can indicate special causes that may not be part of the normal process variation.**Investigation**: Outliers should be investigated to determine their cause. They might result from measurement errors, unusual events, or changes in the process.

## Conclusion

In Lean Six Sigma, histograms are more than just graphs; they visualize data to understand how a process is performing. By arranging numerical data into visually compelling stories, histograms help in determining the predictable from the abnormal.

They highlight whether a process is symmetrical or skewed, whether it’s meeting targets or veering off course. With tools like Excel and our Histogram Analyzer, crafting these insightful charts is within anyone’s grasp. Remember, each bar holds a clue, and outliers require deeper exploration.

## References

- Scott, D.W., 1979. On optimal and data-based histograms.
*Biometrika*,*66*(3), pp.605-610. - Guha, S., Koudas, N. and Shim, K., 2001, July. Data-streams and histograms. In
*Proceedings of the thirty-third annual ACM symposium on Theory of computing*(pp. 471-475).

##### Q: What is a histogram?

A: A histogram is a graphical representation of the distribution of numerical data. It consists of a series of bars, where each bar represents a range of values called a bin. The height of each bar represents the frequency or count of data points falling within that bin.

##### Q: What is the purpose of a histogram?

A: The purpose of a histogram is to visualize and understand the distribution of data. It allows you to identify patterns, trends, and outliers in the data. Histograms are particularly useful for analyzing continuous or interval data and are commonly used in fields such as statistics, data analysis, and research.

##### Q: How do I determine the number of bins for a histogram?

A: The number of bins in a histogram can be determined using various methods. A common rule of thumb is to use the square root of the total number of data points. However, you can also consider the nature of your data and the level of detail you want to display. Experimenting with different bin numbers and assessing the resulting visualization can help you find the most suitable number of bins.

##### Q: What is bin width?

A: Bin width refers to the size or interval of each bin in a histogram. It is calculated by dividing the range of the data by the number of bins. A smaller bin width provides more detailed information but may result in a cluttered histogram, while a larger bin width provides a more general overview but may obscure important details.

##### Q: Can I customize the appearance of a histogram?

A: Yes, you can customize the appearance of a histogram to make it more visually appealing and informative. You can choose different colors for the bars, add labels and titles, adjust the axis scales, and include additional graphical elements such as shading or overlays. Customizing the histogram can enhance its clarity and help convey the intended message effectively.

##### Q: How do I interpret a histogram?

A: To interpret a histogram, analyze the shape, peaks, and gaps in the distribution. Look for patterns, such as symmetry or skewness, that can provide insights into the underlying data. Compare the histogram to expected or theoretical distributions to draw meaningful conclusions. Consider outliers or unusual data points and their implications. The interpretation of a histogram is subjective and depends on the context of the data and the research question being addressed.

## Author

#### Daniel Croft

Daniel Croft is a seasoned continuous improvement manager with a Black Belt in Lean Six Sigma. With over 10 years of real-world application experience across diverse sectors, Daniel has a passion for optimizing processes and fostering a culture of efficiency. He's not just a practitioner but also an avid learner, constantly seeking to expand his knowledge. Outside of his professional life, Daniel has a keen Investing, statistics and knowledge-sharing, which led him to create the website learnleansigma.com, a platform dedicated to Lean Six Sigma and process improvement insights.