Data Collection and Data Types

You are here:

Data in the Measure Phase

The Role of Data Collection

In the Measure phase of DMAIC Lean Six Sigma projects, data collection plays an important role in establishing the current performance of the process being studied. The information gathered during this phase is used to describe and quantify the problem, provide a baseline for future comparison, and identify important process features that will be used to track progress and assess the success of any adjustments implemented.

Typically, data collection during this period includes:

  • Identifying the main process parameters that must be measured
  • Defining the data collecting strategy, including the type of data to be gathered, measuring methods to be used, and data collection frequency
  • Data collection and recording must be done in a systematic and precise manner.
  • Checking the data for completeness and accuracy
Data obtained during the Measure phase is used to calculate critical process metrics such as process capability, process performance, and process stability, which are used to analyse the current performance of the process and identify opportunities for improvement.

Data Collection in Formula One

This article by Racecar Engineering explains how Formular one cars have around 300 sensors onboard producing 1.5 terabytes of data within a race weekend and 11.8 billion data points. When organisations are looking to be the best at whatever it is they are doing data is key to having that edge over your competitors.

Using data in formula one LearnLeanSigma

Here’s an example of how data collecting and analysis are utilised to improve performance in Formula One:

Data collection: During a Formula One race, a team will collect data from a variety of sensors on the car such as engine sensors, tyre sensors, and aerodynamic sensors. This information is gathered in real time and delivered to the team’s data centre.

Data analysis: Following the race, the crew will examine the data gathered by the car’s sensors. They will process and visualise the data using specialist software, and they will seek for patterns and trends in the data to guide judgements on how to improve the car’s performance.

Improved performance: Based on the data analysis, the team may suggest areas for improvement such as increasing engine power, lowering aerodynamic drag, or improving tyre performance. They will utilise this data to make design and setup changes to the car before the next race.

Simulation: The team will also use the data collected from the automobile to mimic various scenarios using computer-aided engineering (CAE) software, allowing them to evaluate the potential design and set-up options before implementing them in the real car.

Testing: After making improvements to the car, the crew will put it through its paces on the track, in the wind tunnel, and on the dyno. This helps the team to validate that the modifications they made were successful.

Formula One teams are able to make constant changes to their cars by gathering and analysing data, which might give them an advantage over their opponents. Furthermore, the use of modelling and testing allows teams to validate their ideas before putting them in the actual car, saving time and costs.

Data Collection Methods

For your projects you are highly likely going to need to collect some data to understand the current problem, verify the problem, validate a solution, verify sustainment etc. This makes data collection integral to Lean Six Sigma and ensuring sucessful projects. Depends on the type of data you are collecting there are different methods of collecting this data.

Here are various methods for gathering data, some of which include:


Surveys are one of the most used data gathering tools, and they are used to acquire information about a population’s views, beliefs, habits, or other characteristics. Surveys can be administered in person, via phone, or online to a sample of the population or to the complete population.

There are various survey kinds that can be employed, including:

  • Self-administered surveys are completed by the participant alone, without the intervention of a researcher. These types of surveys are frequently utilised for online or mail surveys.
  • Surveys administered by an interviewer: Surveys administered by an interviewer are performed by the participant with the assistance of a researcher. These types of surveys are frequently utilised for phone or in-person surveys.
  • Paper-and-pencil surveys are performed by the participant by filling out a printed questionnaire. These types of surveys are frequently utilised for in-person or mail surveys.

  • Computer-assisted surveys: A computer or other electronic device is used by the participant to complete a computer-assisted survey. These surveys can be self-administered or delivered by an interviewer.

Surveys can be built with a range of question formats, including multiple choice, open-ended, and ranking questions. Demographic questions can also be included in surveys to obtain information on the participant’s age, gender, education level, and so on.

One advantage of employing surveys is that they may collect a big amount of data from a large number of people in a short period of time. Surveys, on the other hand, can be vulnerable to bias, such as social desirability bias, and may not provide in-depth information about a certain topic.


Interviews are a type of data collection procedure in which a researcher asks a participant a series of questions in order to gain information. Depending on the research design and practicality, interviews can be performed in person, over the phone, or online.

Structured interviews are ones in which the researcher asks each participant a set of preset questions. These types of interviews are frequently utilised in quantitative research since they are more uniform and allow for more comparison between participants. They are especially effective when the researcher wishes to gather precise information, as the questions are closed-ended with pre-determined answer alternatives.

Unstructured interviews, on the other hand, use open-ended inquiries and the researcher does not have a list of questions prepared in advance. These interviews are more adaptable and allow the researcher to follow up on participants’ intriguing or unexpected comments. They are frequently employed in qualitative research because they enable in-depth investigation of a topic and provide rich and detailed data.

Semi-structured interviews are a cross between the two; they include a series of preset questions as well as open-ended follow-up inquiries. This style of interview helps the researcher to gain an understanding of the participant’s point of view while also gathering particular information.

Regardless of the form of interview, the researcher should have a firm grasp of the study’s objectives and aims, and the questions should be well-crafted, relevant, and non-leading. The researcher should also be prepared to handle any difficult situations that may emerge during the interview, as well as be considerate of the participants’ time and willingness to participate.


Gemba (Observations)

Gemba observations, also known as “going to the place of work” in Japanese, is a type of observation method that is commonly used in Lean Six Sigma and Continuous Improvement methodologies. A researcher observes and records the behaviour of personnel, procedures, and equipment in the workplace’s natural surroundings.

During a Gemba observation, the researcher observes the process or activity being examined, noting the stages taken, the flow of materials, interactions between people and equipment, and any difficulties or concerns that arise. The researcher will also collect process information such as cycle time, process time, takt time, and other key performance indicators.

Structured or unstructured gemba observations are also possible. The researcher conducts structured observations by watching a preset set of behaviours or activities. These observations are frequently used to collect specific data and may contain checklists or forms for recording information.

In contrast, unstructured observations entail the researcher monitoring whatever behaviour occurs organically in the workplace. These kinds of observations are frequently employed in qualitative research since they allow for in-depth investigation of a topic and give rich and detailed data.

Gemba observations aim to understand the present process, identify opportunities for development, and collect data that can be used to make judgements on how to enhance it. It is a valuable tool for understanding the underlying process and identifying the main cause of problems, as well as validating team assumptions. It also aids in including workers in the improvement process and understanding their point of view.


Machine Sensors

Data collection from equipment sensors is a common method used in industrial settings and manufacturing plants to gather information about the performance and condition of equipment. Sensors are devices that detect and measure physical parameters such as temperature, pressure, vibration, or flow rate, and they can collect data in real-time or at predetermined intervals.

Here are some instances of data collecting from equipment sensors:

  • Temperature sensors are used to keep track of the temperature of an equipment or process.
  • Pressure sensors are used in systems to measure the pressure of fluids or gases.
  • Flow sensors are used to determine the flow rate of fluids or gases in a system.
  • Vibration sensors are used to measure the vibration of an equipment or process.
  • Sensors that detect the presence of an object or person

The information gathered by equipment sensors can be utilised to:

  • Real-time monitoring of equipment performance
  • Detect and diagnose equipment malfunctions or failures
  • Improve the performance and efficiency of your equipment.
  • Increase equipment dependability and decrease downtime
  • Recognize patterns and trends that can be used to advise predictive maintenance.

Data from equipment sensors is typically sent to a centralised location, such as a control room or data centre, where it can be processed and used to make choices about equipment operation and maintenance.

Equipment sensors have the advantage of providing real-time data and monitoring equipment even when it is not in use. However, they must be maintained and calibrated on a regular basis, and the data collected may require further processing before it can be used.


Data Types

Data can be classified into two main categories: Categorical or Qualitative data and Numerical or Quantitative data.

Types of data

Categorical or qualitative data refers to information that may be classified or categorised, such as colour, gender, or product kind. These data are typically non-numerical and can be divided into three subcategories:

  • Normal: Normal categorical data refers to data that can only have one of a few possible values, such as a product that can be either a chair or a table.
  • Ordinal: Ordinal categorical data refers to information that may be ranked or sorted, such as a product that can be rated as great, good, fair, or poor.
  • Binary: Binary categorical data refers to data that can only have one of two possible values, such as whether a consumer is new or returning.

Numerical or Quantitative data refers to data that can be expressed as a number, such as height, weight, or temperature. These statistics are often numerical and can be divided into two subcategories:

  • Discrete: Data that can only take on definite, distinct values, such as the number of clients in a business, is referred to as discrete numerical data.
  • Continuous: Continuous numerical data is data that can take any value within a given range, such as temperature or weight.

Collecting Nominal Data

When conducting a survey to collect nominal data, both open-ended and closed-ended questions can be utilised.

Closed-ended questions are appropriate when the question has a limited number of possible answers. “What is your employment status? (Employed / Unemployed)” is an example.

Open-ended inquiries, on the other hand, are useful when the question being asked has a vast number of alternative answers or when it is not possible to construct an exhaustive list of responses. “What is your native language?” for example, or “What is your zip or postal code?”



Displaying Nominal Data

Nominal data that is categorical in nature can be graphically represented using pie charts or bar charts. These charts are handy for displaying data distribution across multiple categories. The data is segregated into several categories, and the order of the categories has no bearing on the understanding of the data because the data has no inherent order.

A pie chart, for example, can be used to indicate a company’s percentage of different sorts of products sold. A bar chart can be used to compare the quantity of clients of various ages.


Pie chart car brands percentage
Bar chart percent of brands

Collecting Ordinal Data

Ordinal data is defined as information that may be rated or sorted in some way. Ordinal data is generally collected via close-ended questions on surveys that provide participants with a restricted number of alternative answers to pick from. This makes it possible to simply categorise and order the data during the analysis step.

A question like “How many times a day on average do you drink coffee?” with answer possibilities like “0, 1-3, 4-6, 7-9” is an example of ordinal data collecting, because the answer options are ordered by the amount of coffee consumed.

Open-ended inquiries, on the other hand, can produce an infinite number of responses, making it difficult to categorise or organise during the analysis stage.

It’s vital to note that ordinal data only provides information on the order of the categories, not the distance or interval between them. It is also critical to examine ethical considerations and acquire informed consent from participants.


Displaying Ordinal Data

A useful way to graphically display ordinal data is with a bar chart, similar to nominal data. However, the sequence of the data matters as there is a defined order of the categorization of ordinal data. The bar chart of number of coffees drank per day is an example of how ordinal data can be displayed.

Bar chart of number of coffees drank per day

With ordinal data, we can identify the central tendency of where most of the values lie. Mean, Median and Mode are the three most common measures of central tendency.

The mode is the most frequently occurring data, in the example above 1-3 is the mode as it has the most frequent response rate. This can be found for almost all ordinal data sets.

The median can be found in some cases. To find the median, you need to order all the data values and locate the middle of the data set. For example, if the data set was 4 6 3 6 7 2 1, you would put them in order of lowest to highest. 1 2 3 4 6 6 7 Then find the middle number, in this case, the middle number is 4, If we add an extra number making the data sequence 1 2 3 4 6 6 7 8 there are two numbers in the middle making the median 4 and 6.

The mean, on the other hand, cannot be calculated with ordinal data. To find the mean, you need to add up all the responses and divide by the total number of responses. With ordinal data, this is not possible as the responses cannot be added together, as they have no numerical value.

It’s important to note that ordinal data, as well as the central tendency measures, is limited in the information they can provide. For example, ordinal data does not provide information about the distance or interval between the categories, and the central tendency measures do not provide a complete picture of the data distribution.


Collecting Discrete Data

Discrete data refers to data that can only take specific, separate values, such as the number of customers in a store, or number of defects in a production process. When collecting discrete data variables, this is usually done using open-ended questions on surveys that give the participants an unlimited number of possible answers to choose from, but these answers should be whole numbers. 

For example, an open-ended question asking “What shoe size are you?” with an answer input box for whole number input [7], or “How many defects did the process produce?” with an answer input box for whole number input [25], or “How many spaces are in the car park?” with an answer input box for whole number input [44] are examples of discrete data collection. 

It’s important to note that discrete data is different from continuous data, where any value within a certain range can be taken, also it is different than ordinal data where the data can be ranked or ordered in some way. 


Displaying Discrete Data

A useful way to graphically display discrete data is with different types of charts such as pie charts, bar charts or histograms. Histograms are particularly useful when there are too many discrete values to fit on a standard chart. A histogram is similar to a bar chart, except each column represents a range of values, also known as a class interval.

For example, if you want to chart a large range of variables such as the volume of production for every week of the year, which could mean up to 52 different responses to volume. Using a histogram, you could group these variables into ranges, such as 0 to 500, 501 to 1000, 1001 to 1500, and 1501 to 2000, and then identify the most frequent group.

In the example below, between 4 and 5 is the most common response, also making this the mode.


It is important to choose the appropriate type of chart based on the research question and the type of data being displayed, and also to label the chart properly to make it easy to interpret.


Collecting Continuous Data

For continuous data, questions should be open-ended questions as there could be an almost infinite number of answers to the question. It is good practice to confirm the unit of measurement, for example, meters, kilograms, or seconds, to ensure consistency in response units.

Example of questions for continuous data is:

  • How tall are you (in meters to 3 decimal places)?  [1.734]
  • How much profit does XYZ product make ($)?  [$25.56],
  • How long does the process take to complete (Seconds)? [11.5]

It’s important to note that continuous data is different from discrete data, where the data can only take specific, separate values, also it is different than ordinal data where the data can be ranked or ordered in some way. 


Displaying Continuous Data

Similar to discrete data, histograms can be used to display continuous data, but other types of charts such as line or scatter graphs can also be used. For example, a line graph can be used to display the average temperature each day in August.

Line graph of temprature in august - Continuous data

Line graphs are useful for identifying trends or patterns in the data, such as if the temperature is rising, falling, or remaining consistent over time. However, it is difficult to analyze the most frequent temperature range using a line graph. This is where a histogram can be useful, as it groups the data into ranges and allows for easy identification of the most frequent range. The example below shows the same data set, but presented in a histogram format.

Histogram of tempratures in august - Continuous data

It is important to choose the appropriate type of chart based on the research question and the type of data being displayed, and also to label the chart properly to make it easy to interpret.

Data Collection Planning

How to plan data collection

Several steps may be involved in planning a data collection, including:

Define the research question: Make a clear definition of the research issue or problem that you are attempting to answer. This will assist you in determining what types of data you need to collect and the procedures to employ.

Choose the appropriate data collection method: Based on the research question, choose the appropriate data collection method such as surveys, interviews, observations, experiments, or secondary data sources.

Determine the sample size: Determine the sample size, which is the number of participants or units of observation that you need to collect data from. This will be determined by the study question and the method of data collecting used.

Develop the data collection instruments: Develop the data collection instruments such as survey questionnaires, interview guides, or observation checklists. Make certain that the questions or prompts are clear, neutral, and pertinent to the research question.

Pilot test the data collection instruments: With a small sample of participants, pilot test the data collection instruments to ensure that they are clear, easy to understand, and yield the intended data.

Plan for data storage and management: Think about how you’ll store and manage your data, including data entry, cleaning, and analysis.

Ensure ethical considerations: Ensure that the data gathering method is ethical, and acquire informed consent from participants if necessary.

Finalize the data collection plan: Finalize the data collection plan and make sure that everyone involved in the data collection process understands their role and responsibilities.

Download: Data Collection Plan

It is crucial to note that data collection planning can be a difficult process that necessitates close attention to detail and may necessitate numerous revisions before the plan is finalised.


In Conclusion, data collection is critical in the Measure phase of DMAIC Lean Six Sigma initiatives. It aids in determining the current performance of the process under investigation and serves as a benchmark for future comparison. The data gathering process includes identifying and measuring critical process parameters, developing a data collection strategy, and collecting and documenting data in a systematic and precise manner.

Depending on the type of data being collected, several data gathering methods including as surveys, interviews, observations, and equipment sensors can be utilised in Lean Six Sigma projects. The data gathered during the Measure phase is utilised to compute essential process metrics, which are used to identify areas for improvement and measure progress. Overall, data collecting is an important part of Lean Six Sigma initiatives, and a well-planned data collection approach is critical to project success.


What's Next?

Now we have an understanding of data and data collection, the next step is to understand how we can analyze this data to understand the root causes of the problems so what suitable solutions can be implemented.