Understanding the reliability of your measurement system is crucial for effective process management, especially in the context of Lean Six Sigma. Attribute Agreement Analysis (AAA) serves as an invaluable tool for this purpose.
This guide aims to provide a detailed insight into AAA, covering its definition, importance, and methodology. We’ll explore how to assess the reliability of attribute (categorical) data by measuring the agreement between multiple appraisers’ ratings against a known standard. By the end, you’ll be equipped to conduct your own AAA, ensuring the robustness of your data and enhancing your continuous improvement efforts.
What is Attribute Agreement Analysis?
Definition and Purpose
Attribute Agreement Analysis is a statistical technique used to evaluate the agreement among different appraisers’ judgments on categorical data. It assesses whether multiple individuals making judgments or assessments on the same item would reach a high level of agreement by evaluating the repeatability, reproducibility, and overall accuracy of the appraisers. Repeatability refers to the variation in assessments when repeated by the same appraiser, while reproducibility refers to variation when assessments are made by different appraisers on the same item.
Why is Attribute Agreement Analysis Important?
Benefits and Applications
AAA is pivotal for ensuring the quality and reliability of data in decision-making processes. It helps to:
Characterize the quality of data by identifying areas of non-agreement.
Calibrate appraisers, judges, or assessors for a higher level of agreement.

Enhance the consistency and accuracy of judgments, thereby contributing to better decision-making and operational efficiency. This technique finds applications across various sectors, especially in quality control, manufacturing, and any domain where accurate categorization and assessment are crucial for operational success.
Types of Attribute Data: Nominal and Ordinal
In the context of Lean Six Sigma and continuous improvement, understanding different types of data is crucial for quality control and decision-making. Attribute data is a type of qualitative data that is categorized into labels or attributes, rather than numerical values. Attribute data is primarily divided into two types: Nominal and Ordinal.
Nominal Data
Nominal data consists of categories that have no inherent or meaningful order. These categories are mutually exclusive and exhaustive, meaning that every observation can only belong to one category, and all possibilities are accounted for.
Examples in Continuous Improvement:
- Inspection results such as Pass/Fail
- Types of defects like Scratch, Dent, or Crack
- Employee departments like Manufacturing, Logistics, and Sales
Graphical Representation:
In most cases, a bar chart or pie chart is used to represent nominal data. Each category gets its own bar or pie slice, and the height or size of the slice represents the frequency or percentage of observations in that category.
Ordinal Data
Ordinal data is similar to nominal data, but the categories have a meaningful order. However, the intervals between these ordered categories are not uniform or measurable.
Examples in Continuous Improvement:
- Customer satisfaction ratings from 1 to 5
- Severity level of defects: Low, Medium, High
- Skill level of workers: Novice, Intermediate, Expert
Graphical Representation:
Ordinal data can also be represented using bar charts, but it is important to maintain the order of the categories. Sometimes, a line chart may also be useful to highlight trends across ordered categories.
Components Involved in Attribute Agreement Analysis (AAA)
Attribute Agreement Analysis (AAA) is a statistical method used to assess the reliability and agreement among different appraisers. This is particularly useful in quality control processes where multiple individuals may be involved in assessing a product or service. Here are the key components:
Appraiser
The appraiser is the individual responsible for making the assessment or measurement. Their role is to apply the criteria or standard to the item being assessed.
Expert (Optional)
An expert is the Subject Matter Expert (SME) who sets the standard or criterion against which appraisers’ assessments are compared. While not always necessary, having an expert can lend more credibility and consistency to the analysis.
Repeatability
Repeatability refers to the variation that occurs when the same item is assessed multiple times by the same appraiser. High repeatability indicates that the appraiser is consistent in their assessments.
Reproducibility
Reproducibility, on the other hand, measures the variation when the same item is assessed by different appraisers. High reproducibility indicates that there is agreement among different appraisers, which is ideal for quality control.
Graphical Representation:
A common way to visually represent AAA results is through the use of scatter plots or control charts, which can show the range of variation and help identify outliers or trends.
Step 1: Define the Objective
Defining the objective is the foundation upon which the entire AAA is built. Without a clear objective, the analysis can become aimless and may not provide meaningful insights.
Detailed Steps:
Identify the Attribute: Determine what specific attribute you will be assessing. Is it the quality of a product, customer service ratings, or perhaps the accuracy of a machine?
Set the Criteria: Clearly outline the criteria that will be used to assess the attribute. For instance, if you’re measuring product quality, the criteria might be Pass/Fail, or it could be various types of defects like scratches, dents, etc.
Justify the Need: Explain why this attribute is critical to measure. Does it relate to customer satisfaction, operational efficiency, or compliance with regulations?
Define the Scope: Make sure to outline what is in and out of scope for the AAA. This helps to keep the analysis focused.
Documentation: Document the objective, attribute, and criteria so that everyone involved understands what is being measured and why.
Example Objective:
“To evaluate the consistency and reliability of quality control inspectors in identifying and classifying defects in the manufactured widgets. The criteria for assessment will be categorizing defects as ‘Scratch’, ‘Dent’, or ‘Crack’.”
Step 2: Select the Appraisers
The appraisers are the individuals who will be conducting the assessments. Their reliability directly impacts the validity of the AAA.
Detailed Steps:
Identify Potential Appraisers: List the individuals who are typically involved in these kinds of assessments in their day-to-day roles.
Consider Experience: Consider the experience level and expertise of potential appraisers. A mix of novices and experts can provide a broader perspective.
Commitment: Ensure that the selected appraisers are committed to participating fully in the AAA process, including possible retraining based on the results.
Briefing: Brief the selected appraisers on the objective, criteria, and importance of the AAA.
Documentation: Document who has been chosen and why they are considered appropriate for this analysis.
Example Appraisers:
Quality control inspectors who regularly check the widgets for defects.
Step 3: Identify an Expert (Optional)
An expert serves as the gold standard against which other appraisers are compared. Though optional, having an expert adds a level of rigor to the AAA.
Detailed Steps:
Expert Identification: Choose a Subject Matter Expert (SME) who has significant experience and expertise in the attribute being measured.
Confirm Availability: Make sure the expert is available for the duration of the AAA.
Training: The expert should ideally also participate in training sessions, both to refresh their own understanding of the criteria and to provide guidance to the appraisers.
Documentation: Document the expert’s credentials and role in the AAA.
Example Expert:
A senior quality control manager with over 10 years of experience in widget manufacturing.
Step 4: Prepare the Samples
Selecting a representative sample is crucial for the validity of the AAA. The samples you choose must adequately capture the variation in the attribute you are studying.
Detailed Steps:
Define Population: Clearly state the population that the samples will be drawn from. For example, are you looking at all widgets produced in a month or just a specific type?
Determine Sample Size: Choose a sample size that is statistically meaningful but also manageable for the appraisers. The larger the sample, the more reliable your results will be, but it should also be realistic in terms of time and resources.
Select Samples: Actually draw the samples from your defined population. Make sure the selection is random and unbiased.
Document: Record details of each sample and the methodology used for selection. This is important for traceability and for any future replication of the AAA.
Example:
For an AAA on widget defects, you might choose 50 widgets randomly from a month’s production to cover the variety of defects you are interested in.
Step 5: Randomize the Samples
Randomization eliminates order bias, ensuring that the results are not influenced by the sequence in which samples are assessed.
Detailed Steps:
Shuffle Order: Use a random number generator or a similar method to randomize the order of the samples.
Assign Codes: Optionally, you can assign codes to the samples to anonymize them, making it even less likely that appraisers will be biased.
Document: Document the randomized order for auditability and future reference.
Example:
Use a software tool to randomize the order of the 50 widgets. Each widget could be labeled with a code like W1, W2, etc., and then shuffled.
Step 6: Train the Appraisers
Training ensures that all appraisers have a uniform understanding of the criteria and process, which is crucial for the reliability of the AAA.
Detailed Steps:
Develop Training Material: Create or gather material that clearly explains the criteria for assessment.
Conduct Training Session: Hold a session where appraisers can learn the criteria and ask questions.
Test Understanding: Give appraisers a few test samples to ensure they have understood the criteria properly.
Document: Record details of the training session and the attendees.
Example:
Hold a 1-hour training session with all quality inspectors, using presentation slides and example widgets to clarify the types of defects.
Step 7: Conduct the First Round of Assessments
The first round provides the initial data that will be used to evaluate repeatability and reproducibility.
Detailed Steps:
Assign Samples: Give each appraiser their set of randomized samples.
Record Assessments: As appraisers complete their assessments, record their judgments in a structured format like a spreadsheet.
Ensure Independence: Make sure appraisers are not discussing their assessments with each other to maintain independence.
Document: Keep detailed records of all assessments.
Example:
Each quality inspector assesses the 50 randomized widgets independently and records whether they find each widget to have a Scratch, Dent, or Crack.
Step 8: Conduct the Second Round (For Repeatability)
The second round is essential for assessing how consistent each appraiser is when making repeated assessments.
Detailed Steps:
Time Interval: Wait for a specific time interval to reduce memory bias. This could range from a few hours to a few days, depending on the context.
Repeat Assessment: Have appraisers reassess the same samples in a new randomized order.
Record and Document: As in the first round, record all assessments in a structured format and keep detailed documentation.
Example:
A day after the first round, the same quality inspectors reassess the same 50 widgets, which have been re-randomized, and record their new assessments.
Step 9: Analyze Repeatability
Analyzing repeatability helps to gauge how consistent each appraiser is when making assessments. The aim is to understand if the same appraiser is likely to give the same rating for the same sample across different instances.
Detailed Steps:
Choose Statistical Methods: Decide on the statistical methods you will use, such as percent agreement, Kappa statistics, or control charts.
Compute Metrics: Use the recorded data to compute the chosen metrics for each appraiser.
Visualize Data: A control chart or similar graphical tool can help visualize the repeatability across different appraisers.
Document: Record all calculations, assumptions, and findings for future reference and auditing.
Example:
Calculate the Kappa statistic for each appraiser to quantify the level of agreement between their first and second rounds of assessments.
Step 10: Analyze Reproducibility
Reproducibility analysis helps in understanding how different appraisers agree with each other and possibly with an expert, giving insights into the overall reliability of the measurement system.
Detailed Steps:
Compute Cross-Appraiser Metrics: Use the recorded data to calculate metrics like percent agreement or Kappa statistics between different appraisers.
Expert Comparison: If an expert is involved, compare each appraiser’s assessments with the expert’s assessments.
Visual Representation: A scatter plot or similar graphical method can help visualize the degree of agreement between different appraisers.
Document: As always, document all calculations and findings.
Example:
Calculate the percent agreement between each pair of appraisers and between the appraisers and the expert, if available.
Step 11: Interpret the Results
Interpreting the results helps in understanding what the statistical metrics imply about the reliability and validity of the measurement system.
Detailed Steps:
Evaluate Repeatability: Assess whether individual appraisers are consistent in their assessments.
Assess Reproducibility: Determine if there is a general agreement between different appraisers.
Overall Reliability: Combine these analyses to get an overall picture of the measurement system’s reliability.
Document: Prepare a detailed report or presentation that interprets these findings.
Example:
If Kappa values are below 0.6, consider the level of agreement as poor and the measurement system as potentially unreliable.
Step 12: Take Corrective Actions
Based on the interpretation, corrective actions may be needed to improve the measurement system.
Detailed Steps:
Identify Weak Areas: Pinpoint where the issues lie, whether they’re in repeatability, reproducibility, or both.
Plan for Improvement: Develop a plan that could include retraining, clarifying guidelines, or even revisiting the criteria.
Implement Changes: Execute the improvement plan.
Document: Keep a record of the corrective actions taken for audit trails and future reference.
Example:
If the issue lies in repeatability, consider retraining appraisers using more detailed guidelines.
References
- Marques, C., Lopes, N., Santos, G., Delgado, I. and Delgado, P., 2018. Improving operator evaluation skills for defect classification using training strategy supported by attribute agreement analysis. Measurement, 119, pp.129-141.
- Aust, J. and Pons, D., 2022. Assessment of aircraft engine blade inspection performance using attribute agreement analysis. Safety, 8(2), p.23.
- Santiago, N. and Jorge, L., 2012. Attribute Data Treatment of Automated Inspection Vision System For Product Mix-Up Detection. Manufacturing Engineering;.