What is data collection

Guide: Data Collection

Author's Avatar

Daniel Croft

Daniel Croft is an experienced continuous improvement manager with a Lean Six Sigma Black Belt and a Bachelor's degree in Business Management. With more than ten years of experience applying his skills across various industries, Daniel specializes in optimizing processes and improving efficiency. His approach combines practical experience with a deep understanding of business fundamentals to drive meaningful change.

If you are looking to make improvements in the workplace, it is likely you are going to need to use data in some way, particularly if you are looking to use the Six Sigma element of Lean Six Sigma. So if you are looking to use data to understand what is happening in a process, verify root causes, confirm the success of improvements, etc., there are some important aspects of data, the types, and how to ensure you effectively collect the right data to make the right decisions for your projects and business.

Table of Contents

What is Data Collection?

Data collection is a process used to collect data needed for analysis and to make informed decisions. The nature on continuous improvement and Lean Six Sigma relies on data to ensure data-back decisions are being made to ensure the improvement and success of the Lean Six Sigma project. Initially, data collection is useful to understand what is happening in a business. It can be used to find improvement opportunities in the form of KPIs or the count of a type of defect increasing in frequency.

Types of Data

Before we go into the details of data collection methods and how to collect data, it is first important to understand the different types of data that you can collect. Data can be split into different types as shown in the below graphic.

Statistical Process Control Types of data

Types of Data


Categorical or Qualitative Data

Categorical or qualitative data refers to information that may be classified or categorised, such as colour, gender, or product kind. These data are typically non-numerical and can be divided into three subcategories:

  • Normal: Normal categorical data refers to data that can only have one of a few possible values, such as a product that can be either a chair or a table.
  • Ordinal: Ordinal categorical data refers to information that may be ranked or sorted, such as a product that can be rated as great, good, fair, or poor.
  • Binary: Binary categorical data refers to data that can only have one of two possible values, such as whether a consumer is new or returning.

Numerical or Quantitative Data

Numerical or quantitative data refers to data that can be expressed as a number, such as height, weight, or temperature. These statistics are often numerical and can be divided into two subcategories:

  • Discrete: Data that can only take on definite, distinct values, such as the number of clients in a business, is referred to as discrete numerical data.
  • Continuous: Continuous numerical data is data that can take any value within a given range, such as temperature or weight.

Data Collection Methods

When collecting data, there are many methods that can be used. it is important to select the right method to ensure good and useful data is collected, as the quality of your data will directly impact the reliability of the analysis and the results of your improvement project. Commonly used data collection methods used within Lean Six Sigma projects include: Conducting surveys and Questionnaires, Observations, Interviews, collecting Machine-generated data, etc.


Surveys and Questionnaires

Surveys and questionnaires are methods of collecting both quantitative and qualitative data, depending on how you ask the question and the options given to respond, such as a 1–5 rating, which is quantitative, but an open text box to respond to a question will be qualitative. 

Surveys and Questionnairs may seem a simple method of collecting data. However, it is important to consider your goals for the data to ensure you ask the right questions and provide the right options for response. Get this wrong, and you may realize after you have collected the data that you do not have any useful in your project. Another consideration is the number of questions. If you add too many questions to your survey, you are less likely to get a response. Consider surveys you have taken in the past; if they have more than 10 questions, you are less likely to complete them.  According to Drive Research, the typical length of a survey is around 15 questions.

If you are going to use a survey for data collection, our advice would be to first create the survey and test it yourself, then with one or two people from your target audience, review the response, and ask for feedback if the survey made sense. You can then review the feedback and make any adjustments if necessary before sending it out to your entire data collection target audience. In most cases, you have one shot to collect the data, and a second survey being sent out will likely result in fewer responses.

Useful Free Tools for Survey Creation:


Observations are a great method of collecting data, again this method can be useful for collecting both quantitative and qualitative data. An observation usually involves going to where the process is being done, often referred to as “Going Gemba” and observing the process in real time as it is being done. This method allows for a lot of data collection of exactly what is happening in the process in reality, as opposed to what should happen which you might get from reading documents of reviewing process maps of a process.

You can observe and see how the process actually happens, conduct time-in-motion studies to see how long process steps take, and identify any issues with the process. If you are looking to collect data by observing a process, we would recommend that it is also useful to record the process as it is happening so that you can watch it back multiple times, as it is possible you could miss information that is critical to the process. It could also be used in the future as a point of reference.

However, observations are not always an ideal method of data collection, as they can be very time-consuming. Another risk is that there could be some observer bias as there is only one person collecting the data.


Another method that can be used to collect data is through interviews. Interviews usually involve carrying out one-to-one conversations with stakeholders and are usually used to gather specific information. Interviews lend themselves to being a good method of collecting in-depth qualitative data and similar to surveys and questionnaires, you would need to prepare questions. However, as you have the person with you at the time of asking questions, you are able to adapt and ask follow-up questions to dive deeper into a specific area if necessary. 

However, like observations, interviews can be incredibly time and resource-intensive. If you need to aggregate multiple responses, you are going to need to carry out multiple individual interviews. There is the option to do group interviews, but that can risk group thinking and general agreement in a group rather than individual points of view.

Documents and Records

In most modern businesses, data is stored in existing documentation, spreadsheets, records, and databases. Furthermore, with the roll-out of Industry 4.0, machines and equipment are able to generate data that can be downloaded or viewed in real-time with technology such as PLC and SCADA or ERP software such as SAP. This type of data is most likely going to be quantitative in nature, allowing you to understand what is going on based on numbers, which can show trends, outliers, and distributions. They are unlikely to give you in-depth knowledge of why the data is performing the way it is or if inputs were changed, for example.
We recommend that if you are going to collect data, particularly for Lean Six Sigma projects, it is a good place to start by looking at this type of data collection, as it can often be fast and give a good starting point that may direct you to conduct more in-depth data collection with observations, interviews, or surveys later on for further data collection after the initial analysis.


Tools for Data Collection

Following on from the methods of data collection you can use, it is also useful to know the tools and techniques that are useful in the data collection process.

Check Sheets

Checksheets are one of the most simple forms of collecting and organizing data. They are particularly useful for collecting observational data, where you can also tally chart occurrences of specific issues or events as they occur. Check sheets can also be customized to suit the situation in which you are collecting data, making them flexible for most data collection processes.

Another benefit of the check sheets is that they generally require minimal training and can collect real-time data. However, they are not ideal for collecting large amounts of data or complex data types and can be prone to human error if data points are missed or miscounted. 

Data Collection Softwares

Next, we have data collection software that automates the process of collecting data, which makes it quicker and reduces or removes the risk of human error. This software can, in most cases, collect unlimited data and, once set up, requires minimal resources to maintain data collection. 

However, for some organizations, there can be a cost barrier to acquiring or using this type of data collection method, as setting up data collection methods can initially be both cost and time-intensive. They may also require specialized training to set up, navigate, and use for data extraction and analysis.


Finally, we have spreadsheets, which can be used in software by both Google and Microsoft and are highly versatile tools for storing and organizing data, but importantly, they are great tools for doing some basic and medium-level data analysis.

Spreadsheets are by far one of the most popular tools for data collection due to their ability to input vast amounts of data, structure it however you need it and analyze it all within the same software. For basic data collection and analysis, it also does not require much training. 

How to Collect Data

Step 1: Define the Goal of Data Collection

Before you collect data, you need to understand the end goal that you are collecting data for. A well-defined goal will provide guidance for the process of data collection. 

You can do this by identifying the problem you are aiming to solve or the process you want to improve, determining the metrics or KPIs that will provide insights into the problem, and then setting objectives by clearly identifying what success would look like in measurable terms.

Step 2: Choosing a Data Collection Method

Now that you understand the goal of your data collection process, you should decide which data collection method, such as those outlined above, would be most suitable.

Types of Methods

  1. Surveys and Questionnaires: Useful for collecting opinions and experiences.
  2. Observations: For real-time data collection in a process or system.
  3. Interviews: To gather in-depth qualitative insights.
  4. Machine Data: For automated, precise, and high-volume data.

Criteria for Choice

  • Nature of Data: Qualitative or Quantitative.
  • Volume: Small-scale or large-scale.
  • Resources: Time, manpower, and tools available.

Step 3: Planning Data Collection Procedures

Now you have your data collection goal and method for getting there, it is time to plan out your data collection. A well-laid-out data collection plan will improve the success of your data collection and make it more efficient and reliable.

This is where a tool such as a data collection plan can come in very useful. A data collection plan like the one below clearly plans out all the key considerations of the data collection process such as who will collect the data, when is it collected, how it will be collected, etc.

Data Collection Plan TemplateYou can download our Data Collection Plan from the Template section.

Step 4: Collecting Data

After forming the data collection plan, it is time to execute the plan. To do this make sure the people collecting the data are trained to know what data to collect, where they will collect it from, when to collect the data the frequency of data collection, and the recording method. 

We recommend running mini data collection trails to ensure that the data is being collected correctly and conducting a small sample exercise to see if the data collection plan needs to be reviewed and updated.

Step 5: Cleaning and Organizing Data

Once the data has been collected, it is important to clean the data from any errors, inconsistencies, or anomalies that may have been captured and ensure that the data is in a format suitable for conducting the data analysis.

Spreadsheets are usually a useful tool for this, as you can remove duplicates, sort data alphabetically or numerically, use conditional formatting to highlight outlier data, etc. A spreadsheet is also a useful method of storing data to be looked at in the future.


Effective data collection is key to the success of any Lean Six Sigma or continuous improvement project. Understanding your goals,  selecting your data collection methods, and planning are all important steps. Hopefully this guide has provided you with a clear understand of defining your objectives to the final step of data preparation, to ensure that you collect data that is not only accurate but also actionable. 


A: Data collection refers to the process of gathering information or data from various sources for research, analysis, or decision-making purposes. It involves systematically collecting relevant and accurate data to address specific questions or objectives.

A: Data collection is essential because it provides the foundation for making informed decisions, conducting research, and gaining insights into various phenomena. It helps in understanding patterns, trends, and relationships within a specific context and enables evidence-based decision-making.

A: There are several methods of data collection, including surveys, interviews, observations, experiments, document analysis, focus groups, and secondary data collection (using existing data sources). Each method has its own strengths and weaknesses, and the choice of method depends on the research objectives, available resources, and the nature of the data needed.

A: The choice of data collection method depends on various factors such as the research objectives, the type of data required, the population or sample being studied, available resources (time, budget, and personnel), and ethical considerations. Researchers need to carefully assess these factors and select the most suitable method or combination of methods to ensure the collection of high-quality data.

A: Some best practices for data collection include clearly defining research objectives and questions, designing data collection instruments or protocols, ensuring the validity and reliability of measurements, using standardized data collection techniques, providing clear instructions to respondents or data collectors, and documenting the data collection process to ensure transparency and reproducibility.


Picture of Daniel Croft

Daniel Croft

Daniel Croft is a seasoned continuous improvement manager with a Black Belt in Lean Six Sigma. With over 10 years of real-world application experience across diverse sectors, Daniel has a passion for optimizing processes and fostering a culture of efficiency. He's not just a practitioner but also an avid learner, constantly seeking to expand his knowledge. Outside of his professional life, Daniel has a keen Investing, statistics and knowledge-sharing, which led him to create the website learnleansigma.com, a platform dedicated to Lean Six Sigma and process improvement insights.

All Posts

Download Template

Free Lean Six Sigma Templates

Improve your Lean Six Sigma projects with our free templates. They're designed to make implementation and management easier, helping you achieve better results.

Other Guides