Six Sigma Quality Resources for European Companies In association withValeocon Management Consulting
 Main Site > Europe Channel > Methodologies  > Six Sigma Search:
 
 for    
Publications
Marketplace
| iSixSigma
Stuff
| iSixSigma
Blogosphere
| Events
Calendar
| The
Dictionary
| Discussion
Forum
| Find
a Job
| Post
a Job
| Industry
News
| Newsletter
Signup
| Sigma
Calculator
| Online
Surveys
Nominations for iSixSigma Awards! close November 30 – nominate your project/program today!
iSixSigma Magazine Signup
 iSixSigma Live!  
  Live! Home
  2010 Summit & Awards
  2010 Energy Forum
 Free Newsletters!  
  Sign Up Now!
  Manage Subscriptions
  New To Six Sigma?
  Six Sigma Q&A
  Cert. Practice Test
  Problem Solving Wizard
  ISSSP Info
ISSSP Is The Official Six Sigma Society of iSixSigma
 Channels 
  iSixSigma Main
  Financial Services
  Healthcare
  Military
  Software / IT
 Quality Directory 
  Recent Articles
  Certifications/Awards
  Consultants
  Culture Evolution
  Methodologies
   BPR
   DMAIC
   Kaizen
   Metrics
   Six Sigma
   TQM
   Work-Out
  News & Events
  Organizations
  Product/Service Guides
  Statistics & Analysis
  Tools & Templates
  Voice of the Customer
  Free Whitepapers
 Related Topics 
  Innovation
  Outsourcing/Offshoring
  Business Process Mgt
 Quick Access 
  Help
  Search
  Advertise Here
  Article Archives
  Newsletter Archives
 User Feedback 
  Please suggest site
  improvements.
 
  [ larger form ]

Dealing with Non-normal Data: Strategies and Tools

Bookmark This Page Bookmark This Page
Email This Page Email This Page
Format for Printing Format for Printing
Cite This Article Cite This Article
Submit an Article Submit an Article
Six Sigma Article Archive Read More Articles
Related Tools & Articles
  • Discussion Forum
    "Does anyone have a guideline, or at least an opinion, for what distribution is the most appropriate for the variable 'time'?"

    Contribute to this Discussion
    Download Products

    By Arne Buthmann

    Normally distributed data is a commonly misunderstood concept in Six Sigma. Some people believe that all data collected and used for analysis must be distributed normally. But normal distribution does not happen as often as people think, and it is not a main objective. Normal distribution is a means to an end, not the end itself.

    Normally distributed data is needed to use a number of statistical tools, such as individuals control charts, Cp/Cpk analysis, t-tests and the analysis of variance (ANOVA). If a practitioner is not using such a specific tool, however, it is not important whether data is distributed normally. The distribution becomes an issue only when practitioners reach a point in a project where they want to use a statistical tool that requires normally distributed data and they do not have it.

    The probability plot in Figure 1 is an example of this type of scenario. In this case, normality clearly cannot be assumed; the p-value is less than 0.05 and more than 5 percent of the data points are outside the 95 percent confidence interval.

    Figure 1: Probability Plot of Cycle Time

    What can be done? Basically, there are two options:

    1. Identify and, if possible, address reasons for non-normality or
    2. Use tools that do not require normality

    Addressing Reasons for Non-normality

    When data is not normally distributed, the cause for non-normality should be determined and appropriate remedial actions should be taken. There are six reasons that are frequently to blame for non-normality.

    Reason 1: Extreme Values

    Too many extreme values in a data set will result in a skewed distribution. Normality of data can be achieved by cleaning the data. This involves determining measurement errors, data-entry errors and outliers, and removing them from the data for valid reasons.

    It is important that outliers are identified as truly special causes before they are eliminated. Never forget: The nature of normally distributed data is that a small percentage of extreme values can be expected; not every outlier is caused by a special reason. Extreme values should only be explained and removed from the data if there are more of them than expected under normal conditions.

    Reason 2: Overlap of Two or More Processes

    Data may not be normally distributed because it actually comes from more than one process, operator or shift, or from a process that frequently shifts. If two or more data sets that would be normally distributed on their own are overlapped, data may look bimodal or multimodal – it will have two or more most-frequent values.

    The remedial action for these situations is to determine which X’s cause bimodal or multimodal distribution and then stratify the data. The data should be checked again for normality and afterward the stratified processes can be worked with separately.

    An example: The histogram in Figure 2 shows a website’s non-normally distributed load times. After stratifying the load times by weekend versus working day data (Figure 3), both groups are normally distributed.

    Figure 2: Website Load Time Data

    Figure 3: Website Load Time Data After Stratification

    Reason 3: Insufficient Data Discrimination

    Round-off errors or measurement devices with poor resolution can make truly continuous and normally distributed data look discrete and not normal. Insufficient data discrimination – and therefore an insufficient number of different values – can be overcome by using more accurate measurement systems or by collecting more data.

    Reason 4: Sorted Data

    Collected data might not be normally distributed if it represents simply a subset of the total output a process produced. This can happen if data is collected and analyzed after sorting. The data in Figure 4 resulted from a process where the target was to produce bottles with a volume of 100 ml. The lower and upper specifications were 97.5 ml and 102.5 ml. Because all bottles outside of the specifications were already removed from the process, the data is not normally distributed – even if the original data would have been.

    Figure 4: Sorted Bottle Volume Data

    Reason 5: Values Close to Zero or a Natural Limit

    If a process has many values close to zero or a natural limit, the data distribution will skew to the right or left. In this case, a transformation, such as the Box-Cox power transformation, may help make data normal. In this method, all data is raised, or transformed, to a certain exponent, indicated by a Lambda value. When comparing transformed data, everything under comparison must be transformed in the same way.

    The figures below illustrate an example of this concept. Figure 5 shows a set of cycle-time data; Figure 6 shows the same data transformed with the natural logarithm.

    Figure 5: Cycle-Time Data

    Figure 6: Log Cycle-Time Data

    Take note: None of the transformation methods provide a guarantee of a normal distribution. Always check with a probability plot to determine whether normal distribution can be assumed after transformation.

    Reason 6: Data Follows a Different Distribution

    There are many data types that follow a non-normal distribution by nature. Examples include:

    • Weibull distribution, found with life data such as survival times of a product
    • Log-normal distribution, found with length data such as heights
    • Largest-extreme-value distribution, found with data such as the longest down-time each day
    • Exponential distribution, found with growth data such as bacterial growth
    • Poisson distribution, found with rare events such as number of accidents
    • Binomial distribution, found with “proportion” data such as percent defectives

    If data follows one of these different distributions, it must be dealt with using the same tools as with data that cannot be “made” normal.

    No Normality Required

    Some statistical tools do not require normally distributed data. To help practitioners understand when and how these tools can be used, the table below shows a comparison of tools that do not require normal distribution with their normal-distribution equivalents.

    Comparison of Statistical Analysis Tools for Normally and Non-Normally Distributed Data
    Tools for Normally Distributed DataEquivalent Tools for Non-Normally Distributed Data Distribution Required 
    T-testMann-Whitney test; Mood's median test; Kruskal-Wallis test Any 
    ANOVA Mood's median test; Kruskal-Wallis test Any 
    Paired t-test One-sample sign test Any
    F-test; Bartlett's test Levene's test Any 
    Individuals control chart Run Chart Any 
    Cp/Cpk analysis Cp/Cpk analysis Weibull; log-normal; largest extreme value; Poisson; exponential; binomial 

    About the Author: Arne Buthmann is a senior consultant with Valeocon Management Consulting in Europe. He has a wide range of experience in consulting and training multi-national business enterprises such as Novartis, Johnson & Johnson, Merial, Danone, TRW, Siemens and Bosch. Mr. Buthmann helps clients to implement Six Sigma, Lean and Design for Six Sigma, and is co-author of the book Produkt- und Prozessdesign für Six Sigma mit DFSS. He can be reached at arne.buthmann@valeocon.com.

     
    Rate This Article:  Current Rating: 4.59
      Poor    Excellent     
              1    2    3     4    5
    Copyright � 2000-2009 iSixSigma – All Rights Reserved
    Reproduction Without Permission Is Strictly Prohibited – Copyright Requests


    Publish an Article: Do you have a Six Sigma tip, learning or case study?
    Share it with the largest community of Six Sigma professionals, and be recognized by your peers.
    It's a great way to promote your expertise and/or build your resume. Read more about submitting an article.



    BEST SELLING PRODUCTS (iSixSigma Publications)
    1. Six Sigma Black Belt (DMAIC) Training Slides - 2009 Version!
      The 2009 Six Sigma Black Belt course includes over 40 more slides than the 2008 version. Contents include: 1,220 PowerPo...
    2. Certified Lean Six Sigma Black Belt Assessment Exam
      Interested in assessing your knowledge of Lean Six Sigma? Preparing for certifications? Testing your students and traine...
    3. Certified Lean Six Sigma Green Belt Assessment Exam
      This assessment exam is useful for students interested in assessing their knowledge of Lean Six Sigma on the Green Belt ...
    4. Certified Lean Six Sigma Black Belt E-book
      In 670 pages learn everything within the Lean Six Sigma DMAIC body of knowledge to successfully achieve Black Belt certi...
    5. Kaizen Workshop E-book
      This 150+ page ebook teaches key tools and techniques of Kaizen, as well as real application to enhance learning. Kaizen...
    6. Six Sigma Yellow Belt Training Slides - 2009 Version
      The 2009 Six Sigma Yellow Belt course is comprised of: 503 slidesInstructor notesSlide explanations15 data sets19 suppo...
    7. Design For Six Sigma (DFSS) E-Book or Print
      Need an "encyclopedia" consisting of many of the tools you’ll study? Need a helpful refresher to apply the DFSS process?...
     
    Six Sigma AdLinks


    Google AdWords
     
    Home | Discussion Forum | Event Calendar | Job Shop
    Link To iSixSigma | Rate This Page | Report A Problem | Free Content For Your Site | Submit Article For Publishing
     Terms of Service. �2000-2009 iSixSigma. All rights reserved. v3.0lb, 0.1
    About iSixSigmaContact UsPrivacy PolicySite Map