Six Sigma Quality Resources for European Companies In association withValeocon Management Consulting
 Main Site > Europe Channel > Statistics  > Normality Search:
 
 for    
Publications
Marketplace
| iSixSigma
Stuff
| iSixSigma
Blogosphere
| Events
Calendar
| The
Dictionary
| Discussion
Forum
| Find
a Job
| Post
a Job
| Industry
News
| Newsletter
Signup
| Sigma
Calculator
| Online
Surveys
2008 Version! DMAIC Training Slides: 1,176 Slides + Instructor Notes and More for $99.99
iSixSigma Magazine Signup
 iSixSigma Live!  
  Summit & Awards
  Most Successful Start-up
  Breakthrough Projects
  Speaker Proposals
 Free Newsletters!  
  Sign Up Now!
  Manage Subscriptions
  New To Six Sigma?
  Six Sigma Q&A
  Cert. Practice Test
  Problem Solving Wizard
  ISSSP Info
ISSSP Is The Official Six Sigma Society of iSixSigma
 Channels 
  iSixSigma Main
  Financial Services
  Healthcare
  Military
  Software / IT
 Quality Directory 
  Recent Articles
  Certifications/Awards
  Consultants
  Culture Evolution
  Methodologies
  News & Events
  Organizations
  Product/Service Guides
  Statistics & Analysis
   Normality
   Variation
  Tools & Templates
  Voice of the Customer
  Free Whitepapers
 Related Topics 
  Innovation
  Outsourcing/Offshoring
  Business Process Mgt
 Quick Access 
  Help
  Search
  Advertise Here
  Article Archives
  Newsletter Archives
 User Feedback 
  Please suggest site
  improvements.
 
  [ larger form ]

Making Data Normal Using Box-Cox Power Transformation

Bookmark This Page Bookmark This Page
Email This Page Email This Page
Format for Printing Format for Printing
Cite This Article Cite This Article
Submit an Article Submit an Article
Six Sigma Article Archive Read More Articles
Related Tools & Articles
  • Discussion Forum
    "I used the Box-Cox transformation tool on my non-normal data. Now, I have normal data. What do I do with it from here?"

    Contribute to this Discussion

    B
    New from iSixSigmaCSSBB Preparation Pack

    NEW VERSION! Six Sigma Black Belt (DMAIC) Training Slides

    NEW VERSION! Six Sigma Green Belt Training Slides
    y Arne Buthmann

    Normally distributed data is needed to use a number of statistical analysis tools, such as individuals control charts, Cp/Cpk analysis, t-tests and analysis of variance (ANOVA). When data is not normally distributed, the cause for non-normality should be determined and appropriate remedial actions should be taken. (An introduction to remedial actions for non-normal data can be found in "Dealing with Non-normal Data: Strategies and Tools" by Arne Buthman.)

    Data transformation, and particularly the Box-Cox power transformation, is one of these remedial actions that may help to make data normal. By understanding both the concept of transformation and the Box-Cox method, practitioners will be better prepared to work with non-normal data.

    What Are Transformations?

    Transforming data means performing the same mathematical operation on each piece of original data. Some transformation examples from daily life are currency exchange rates (e.g. U.S. Dollar into Euro) and converting degree Celsius into degree Fahrenheit.

    These two transformations are called linear transformations because the original data is simply multiplied or divided by a specific coefficient or a constant is subtracted or added. But these linear transformations do not change the shape of the data distribution and, therefore, do not help to make data look more normal (Figure 1).

    Figure 1: Linear Transformation of Degrees Celsius to Degrees Fahrenheit 

    What is the Box-Cox Power Transformation?

    The statisticians George Box and David Cox developed a procedure to identify an appropriate exponent (Lambda = l) to use to transform data into a “normal shape.” The Lambda value indicates the power to which all data should be raised. In order to do this, the Box-Cox power transformation searches from Lambda = -5 to Lamba = +5 until the best value is found. Table 1 shows some common Box-Cox transformations, where Y' is the transformation of the original data Y. Note that for Lambda = 0, the transformation is NOT Y0 (because this would be 1 for every value) but instead the logarithm of Y.

    Table 1: Common Box-Cox Transformations
    l Y'
    -2Y-2 = 1/Y2
    -1Y-1 = 1/Y1
    -0.5Y-0.5 = 1/(Sqrt(Y))
    log(Y) 
    0.5 Y0.5 = Sqrt(Y)
    Y1 = Y
    Y2

    An example: Figure 2 shows non-normally distributed cycle time data. Using the Box-Cox power transformation in a statistical analysis software program provides an output that indicates the best Lambda values (Figure 3).

    Figure 2: Example of Non-normally Distributed
    Cycle Time Data

    Figure 3: Example Box-Cox Plot of Data 

    The lower and upper confidence levels (CLs) show that the best results for normality were reached with Lambda values between -2.48 and -0.69. Although the best value is -1.54 (estimate in Figure 3), the process works better if this value is rounded to a whole number; this will make it easier to transform the data back and forth. The best whole-number values here are -1 and -2 (the inverse function of Y and Y2, respectively). The histogram in Figure 4 shows the transformed data using Lambda = -1, now more normally distributed.

    Figure 4: Data Transformed Using Lambda = -1

    Does Box-Cox Always Work?

    The Box-Cox power transformation is not a guarantee for normality. This is because it actually does not really check for normality; the method checks for the smallest standard deviation. The assumption is that among all transformations with Lambda values between -5 and +5, transformed data has the highest likelihood – but not a guarantee – to be normally distributed when standard deviation is the smallest. Therefore, it is absolutely necessary to always check the transformed data for normality using a probability plot.

    Additionally, the Box-Cox Power transformation only works if all the data is positive and greater than 0. This, however, can usually be achieved easily by adding a constant (c) to all data such that it all becomes positive before it is transformed. The transformation equation is then:

    Y' = (Y+C)l

    Application Example

    A project team collected cycle time data from a purchase order-generation process. One team member created a control chart of this data (Figure 5) and was about to ask what special cause had happened for data point 40 when the Green Belt remembered that using an individuals control chart requires normally distributed data. A look at the probability plot of the data (Figure 6) revealed non-normal distribution. Therefore, the control limits of the control chart were useless.

    Figure 5: Control Chart of Original Cycle Time Data

    Figure 6: Probability Plot of Original Cycle Time Data

    The Green Belt used the Box-Cox power transformation to determine whether the data could be transformed (Figure 7). Box-Cox suggested a best Lambda value of 0.5 for transformation (i.e., the square root of the original data). And the transformation really worked: The new probability plot confirms normality (Figure 8).

    Figure 7: Box-Cox Plot of Cycle Time Data

    Figure 8: Probability Plot of Transformed Cycle Time Data

    After the transformation, the Green Belt created a control chart of the transformed data and showed that the purchase order-generation process was actually quite stable, i.e., all variation was due to common causes (Figure 9).

    Figure 9: Control Chart of Transformed Cycle Time Data

    Because the individual values of the transformed data have no practical meaning, the Green Belt had to re-create a control chart for the original data, but this time with the correct control limits (Figure 10). To do this, the Belt used the upper and lower CLs from the control chart of the transformed data and transformed them back into their original values. Because the transformation operation was taking the square root, the back-transformation involved squaring the transformed data:

    UCL = UCL’2 = 3.4422 = 11.847
    LCL = LCL’2 = -0.0542= 0.003

    For the mean, the Belt used the mean of the original data.

    Figure 10: Control Chart of the Original Data with
    Correct Control Limits

    This control chart could then be used for the ongoing monitoring of the purchase order-generation process.

    About the Author: Arne Buthmann is a senior consultant with Valeocon Management Consulting in Europe. He has a wide range of experience in consulting and training multi-national business enterprises, such as Novartis, Johnson & Johnson, Merial, Danone, TRW, Siemens and Bosch. Mr. Buthmann helps clients to implement Six Sigma, Lean and Design for Six Sigma, and is co-author of the book Produkt- und Prozessdesign für Six Sigma mit DFSS. He can be reached at arne.buthmann@valeocon.com.

     
    Rate This Article:  Current Rating: 4.49
      Poor    Excellent     
              1    2    3     4    5
    Copyright © 2000-2008 iSixSigma – All Rights Reserved
    Reproduction Without Permission Is Strictly Prohibited – Copyright Requests


    Publish an Article: Do you have a Six Sigma tip, learning or case study?
    Share it with the largest community of Six Sigma professionals, and be recognized by your peers.
    It's a great way to promote your expertise and/or build your resume. Read more about submitting an article.


    Download the iSixSigma Toolbar for 1-Click access. Search Your Way. Everyday. Without Delay.
    Get 1-Click iSixSigma access. Search Your Way. Everyday. Without Delay.

    BEST SELLING PRODUCTS (iSixSigma Publications)
    1. 2008 VERSION! Six Sigma DMAIC Training Slides
      The complete Lean Six Sigma DMAIC course prepares participants to perform the role of a LSS Black Belt; covering what’s ...
    2. NEW VERSION! Process Management Training Slides
      The OSSS Process Management course is designed in two phases comprised of:352 Powerpoint slidesInstructor notesSlide exp...
    3. Certified Lean Six Sigma Black Belt Assessment Exam
      Interested in assessing your knowledge of Lean Six Sigma? Preparing for certifications? Testing your students and traine...
    4. Gage R&R Excel Template
      Gage Repeatability and Reproducibility (R&R) studies measure the amount of measurement variation that is attributabl...
    5. E6 Sigma DMAIC EZ: Black Belt for Service
      E6 Sigma is THE Six Sigma Holy Grail. The first-ever Six Sigma training and implementation software, REAL Six Sigma is a...
    6. NEW VERSION! Six Sigma Black Belt (DMAIC) Training Slides
      The OSSS Six Sigma Black Belt course is comprised of: 1,176 PowerPoint slides, Instructor notes, Slide explanations, 37 ...
    7. FMEA Excel Template
      Need to be more preventative, prioritize risks, or brainstorm possible failures in a process or product? Use the FMEA to...
     

    Six Sigma AdLinks
    Minitab - The Leader In Six Sigma Statistics
    ifss-institute for six sigma
    Black Belt or Not, Software You Can Use: SQCpack
    iSixSigma Live! Save up to $700


    Google AdWords
     
    Home | Discussion Forum | Event Calendar | Job Shop
    Link To iSixSigma | Rate This Page | Report A Problem | Free Content For Your Site | Submit Article For Publishing
     Terms of Service. ©2000-2008 iSixSigma. All rights reserved. v3.0lb, 2.4-A-244
    About iSixSigma · Contact Us · Privacy Policy · Site Map
    nogeo