There are different types of data in statistics, that are collected, analysed, interpreted, and presented in different ways. Data is individual pieces of information that are recorded and used for the intent of analysis. By analysing data, we can make interpretations of what that data is telling us.
Basic data types – Overview
Within continuous improvement projects, it’s common to use data and facts for decision-making to ensure the correct actions are being taken based on facts rather than feelings and emotions. To be able to do this effectively for Lean Six Sigma projects it is important to understand what types of data we have and what method is most suitable for analysing, interpreting, and presenting.
Being able to identify data types, analyse, interpret, and present data is an important skill for any lean six sigma practitioner to support decision-making for improvement projects and communicate what is happening in the process to project stakeholders to get support and buy-in to the problem. They can also be used to verify solutions when running trials and experiments.
Data can be classified into four different categories:
- Categorial or Qualitative data
- Numerical or Quantitative data
Qualitative data commonly called categorical data is data that fits into categories. Qualitative data relating to quality data is not numerical and the data usually involves information that can be categorized such as colors, genders etc. Categories for this type of data are specified in words, not numbers.
In some cases, you can get categorical data in numeric forms such as age, date or shoe size but these are fixed into categories and the values do now have a mathematical use for analysis.
In statistics, nominal data also known as the nominal scale and is a data type that is used to label variables without providing any quantitative value. Nominal data cannot be ordered or measured and is usually grouped. Nominal data can be expressed in words and in numbers but what makes it qualitative and not quantitative is the fact you cannot order the labels in any meaningful way e.g. Male and Female or colours Red, Green, Blue there is no defined order or sequencing of these variables for them to be quantitative.
Examples of nominal data:
|Employment Status||– Employed|
Variables that can be coded in only two ways such as Yes/No, On/Off, Employed/ Unemployed are classified as binary or 1s and 0s. As the order of order of the labels of those variables does not matter, they are types of nominal variables.
How to collect nominal data
Nominal data can be collected in the survey using both open and closed-ended questions.
If a question you are interested in getting an answer to has a small number of answers you can use closed-ended questions.
Example of closed-ended question:
What is your employment status? Employed / Unemployed
If your question has many possible answers or you cannot generate a complete list of answers, use open-ended questions
Example of open-ended questions:
- What is your native language?
- How tall are you?
- What is your zip or postal code?
How to graphically display Nominal data
A useful way to graphically display nominal data is with either a pie chart or bar chart like in the examples below as the data is categorical and can be placed into clear categories. The order of the categories does not matter as there is no defined order to the data.
Ordinal data are classified into categories with a variable that has a natural rankable order. However, the distances between the categories are unknown or uneven.
What lean six sigma qualification do you have?
- White Belt
- Yellow Belt
- Green Belt
- Black Belt
- Master Black Belt
There is a clear order of these categories from lowest rank qualification to highest, but we cannot specify a distance between None and white belt and compare if the distance is the same between None, White belt and Yellow belt.
|Level of Agreement||– Strongly Agree|
– Strongly Disagree
This type of data collection can often be useful for collecting qualitative data around the voice of the customer.
How to collect ordinal data
When collecting ordinal data variables, this is usually done using close-ended questions on surveys that give the participants a limited quantity of possible answers to choose from, so these can be later categorized in the analysis stage.
Open-ended questions could open responses up to an unlimited number of variations which cannot easily be categorized or put into order.
Example question for ordinal data:
|How many times a day on average do you drink coffee?||– 0|
How to graphically display Ordinal data
A useful way to graphically display ordinal data is with a bar chart, similar to nominal data. However, the sequence of the data matters as there is a define order of the categorisation of ordinal data.
With an Ordinal data set, we can identify the central tendency of where most of the values lie. Mean, Median and Mode are the three most common measures of central tendency.
The mode is the most frequently occurring data, in the example above 1-3 is the mode as it has the most frequent response rate. This can be found for almost all ordinal data sets.
The median can be found in some cases. To find the median you need to order all the data values and locate the middle of the data set.
If the data set was 4 6 3 6 7 2 1, you would put them in order of lowest to highest. 1 2 3 4 6 6 7 Then find the middle number, in this case, the middle number is 4, If we add an extra number making the data sequence 1 2 3 4 6 6 7 8 there are two numbers in the middle making the median 4 and 6.
The mean cannot be calculated with ordinal data. To find the mean you need to add up all the responses and divide by the total number of responses. In the
Example about we cannot add together the responses:
If the data set was, 4 6 3 6 7 2 1 you would do 4+6+3+6+7+2+1= 29 divided by 7 as there are 7 numbers in the sequence. Therefore the mean would be 4.14
Quantitative data also known as numerical data represents a numerical value such as how many times… or how many… Numerical data will give information about the quantities of the thing being measured and can be counted and compared. For example height and weight. Numerical data are further classified as either discrete (or count data) and Continuous data.
Note: an easy way to remember is continuous data can be measured on a continuous scale such as height example 1.734 meters you can measure to an unlimited number of decimals. Discrete data are categorized such as shoe size which has fixed categories and are usually the whole number, such as Size 4 or size 5.
Discrete data (Count data)
Discrete data or also known as count data is data that can only take certain values.
Example: Number of people in a group, you can’t have half a person so this would need to be a whole number.
For discrete data, questions can be open-ended allowing for whole number of responses.
How to collect discrete data
When collecting discrete data variables, this is usually done using open-ended questions on surveys that give the participants an unlimited number of possible answers to choose from. However, these should be whole numbers.
Example questions for Discrete data:
|What shoe size are you?||Answer input box |
|How many defects did the process produce?||Answer input box |
|How many spaces are in the car park?||Answer input box |
How to graphically display Discrete data
A useful way to graphically display discrete data is with either a pie chart bar chart or histogram like in the example below. Histograms are useful when you have too many discrete values to fit on a standard chart. A histogram is similar to a bar chart, except each column represent a range of value, also known as a class interval. You may want to chart a large range of variables such as the volume of production for every week of the year, which could mean up to 52 different responses to volume. Using a histogram you could group these variables to be 0 to 500, 501 to 1000, 1001 to 1500 and 1501, to 2000 and then identify the most frequent group.
In the example below between 4 and, 5 is the most common response also making this the mode.
For example with people’s heights
Continuous data can take any value within its range as it is measured on a continuous scale.
Example: The height of a person can take any value such as 1.734 meters or 1.735 meters. With this scale we can be specific rather than rounding to the nearest whole number.
For continuous data, questions should be open-ended questions as there could be a nearly infinite number of answers to the question. It would be good practice to confirm the unit of measurement e.g. Kilograms or pounds etc to ensure consistency in response units.
Example of questions for continuous data:
|How tall are you (in meters to 3 decimal places)?||Answer input box [1.734]|
|How much profit does XYZ product make ($)?||Answer input box [$25.56]|
|How long does the process take to complete (Seconds)?||Answer input box [11.5]|
How to graphically display Continuous data
Like discrete data, histograms can be used to display continuous data, but you could also use a line or scatter graph
The example below shows continuous data on the average temperature each day in august.
The line graph is useful to identify any trends or patterns in the data to see if the temperature is rising, falling or consistent. However, it’s simply to analyse the most frequent temperature range. This is where a histogram becomes useful, example below with the same data set.
In conclusion, when collecting data for your project it is important to consider what type of data you will be collecting to understand the best method of collecting it and if open or closed-ended questions are better for collecting responses. You then need to consider the data type to identify the best method for displaying and analysing it.
The next step is to move on to some data analysis tools to understand how we can use charts and data to understand what is going on in our processes and where we should focus on our process improvements. Pareto charts are one such graphical tool.