Within Lean Six Sigma projects we have a range of data analysis tools we can use to analyse and interpret data to understand what is happening in the services, products or processes we have observed. These data analysis tools include Pareto analysis, Histograms and Box plots among others. Within this article, we are going to explore the basics of Scatter Plots or also known as Scatter Diagrams.
What is a Scatter Diagram?
Scatter Diagrams, also known as Scatter Plots, Scatter Charts or Scatter Graphs, are a type of graphical analysis displaying values of two variables for a set of data and being able to make predictions based on the data. As can be seen in the example below it has an X-axis (horizontal axis) and a Y-axis (vertical axis). The graph is then completed with dots, with each dot representing a measurement of both the X and y-axis. It can then show the relationships between the data points and show trends or correlations which we will go into more detail about below.
When to use a Scatter Plot
There are three reasons you might use a Scatter plot when analysing and interpreting data for your lean six sigma project.
Identifying Patterns and correlations in data
A good use for Scatter plots is to identify patterns in data, data points can be grouped together based on how close their value is, which makes it easy to identify any outlier points or when there are any data gaps.
Seeing data graphed up as scatter plots can aid in the identification of correlations between variables and the kind of correlations can also be estimated based on confidence levels.
- Negative correlations – represent a fall and this can be seen on the chart as data points slope downwards from the top left corner of the chart to the bottom right corner of the chart.
- Positive correlations – represent a right and can be seen on the chart as data points sloping upwards from the bottom left corner of the chart to the top right corner of the chart.
- Data that is neither positively nor negatively correlated is considered uncorrelated (Null hypothesis).
Demonstration of the relationship between two variables
The most common reason to use a scatter plot to display and interpret data is to display the relationship between two variables and observe the kind of relationship between the variables. The relationships can be observed as either positive or negative, linear or non-linear and strong or weak.
Each data point on the scatter plot represents each individual value of those data points and allows for pattern identification when looking at data holistically.
Identification of correlational relationships
Another common use for scatter plots is that they are useful in enabling the identification of correlational relationships between variables. Scatter plots tend to have independent variables on the horizontal axis and the vertical axis. It allows the observer to know or get an idea of what value the variable will produce with data input from only one axis for a data point. Meaning that a set of data points can give further clarity of future data point results without the need to test and verify them further based on previous correlation evidence.
In the example below there are no data points between 14 and 24 on the horizontal axis but with other data points and the line of the best fit, we can easily estimate the results of the variables.
Line of best fit
As can be seen in the examples about the line of best fit cut the data points exactly in half as the variables have a perfect correlation, this is not likely to always be the case as with the example below. Data points can often be above and below the line taking the line of best fit between the data points.
Scatter Plot Template
Download the free Scatter Plot excel template to use on the data analysis of your lean six sigma projects. All you need to do is input your data and the formulas will automatically display the data in the Scatter plot with the line of best fit for you to analyse and interpret. It will also calculate the Correlation value of your data
Conclusion
In conclusion, scatter plots are very useful to understand the relationships between two variables and identify positive and negative correlations in our data when doing experiments in our Lean Six Sigma projects.
What’s next
Now that you have an understanding of some of the basic types of data analysis that can be done in Lean Six Sigma projects time to explore Root causes analysis to help identify the root causes of the problems to ensure our projects don’t address the symptoms. Find out more about Root Causes analysis