OMB Meeting Book - January 8, 2015

82

Alternative Approaches to the Traditional Collaborative Study

in that the acceptance criteria for the method validation can be clearly and quantitatively stated using target measurement uncertainty. A paper by Weitzel and Johnson (7) describes a process using decision rules and probability to determine a target mea- surement uncertainty that is then used to set the acceptance criteria for a method validation. Target measurement uncertainty is defined as “measurement uncertainty specified as an upper limit and decided on the basis of the intended use of measurement results (8).” The target measurement uncertainty can be used to decide appropriate values for validation criteria, such as bias, precision, LOD, and LOQ; thus, directly linking the SMPR to fitness-for-purpose. Proficiency testing (PT) is a widely recognized practice for monitoring ana- lytical performance, and in some ways the PT process is very similar to the process of a collaborative study. Test materials are prepared and distributed by a program/project coordinator. Each participating laboratory analyzes a common set of blind test samples, and reports their results back to the coor- dinator. The coordinator then analyzes the data. Of course, there are several differences between PT programs and collaborative studies: 1. the aim of PT is to assess the performance of the lab- oratory not the method; 2. laboratories may use any appropriate method they choose for PT; and 3. the data is ana- lyzed to determine how the individual laboratory performs in relation to the whole group of laboratories. For many years, it has been strictly forbidden to even suggest that PT data might be used for the purposes of evaluating a method. However, in 2010, Ellison et al. published a paper proposing that there might be a role for proficiency testing data in method vali- dation under certain conditions. They concluded that a properly implemented PT program provides very similar infor- mation to a traditional collaborative study, and should be given equal weight in appraising methods for suitability (9). Proficiency Testing

ated with the calculated RSD (R) . It may not be immediately obvious, but organizations such as AOAC indirectly establish a confidence interval around the calculated RSD (R) by the simple act of requiring a minimum number of data sets. This has been the paradigm of method validation for more than 50 years. (AOAC has been operating for over 125 years, but for much of its history, there was not an agreed upon minimum number of valid data sets. That didn’t happen until the 1980s.) There is another paradigm that is generally called “fitness-for-purpose.” Instead of forcing method developers and users to accept a confidence level derived as a consequence of the mini- mum number of collaborators, it is also possible to allow method developers to determine the appropriate confi- dence level and then find the necessary number of collaborators. The key to a fitness-for-purpose validation model is that a method developer would be required to report the target confidence interval. A target interval is not nor- mally calculated or reported because there is an implied target interval with the current eight laboratory minimum collaborative study model. A fitness-for-purpose model has two advantages: 1. potential method users can decide if the reported reproduc- ibility and confidence level are good enough for their purposes, much as a potential user can now assess the recovery, accuracy, LOQ, and range of applicability; and 2. in some cases, notably government-sponsored valida- tion projects, the number of data sets

laboratory reproducibility of a method as measured by the relative standard deviation for reproducibility [RSD (R) ]; 2. provides or confirms the accuracy (trueness; when a certified reference material is used) and repeatability (precision) characteristics of a method; 3. determines if the instructions for a method are clear and can be followed by analysts who are not affiliated with the method developer; and 4. determines that the method has been designed so that the operating parameters that might affect the performance of the method are truly known and under con- trol (robustness). Most of a method evaluation can be completed in a single laboratory. For example, accuracy, repeatability, and rug- gedness can be determined in just one laboratory, AOAC has a well-described procedure, the Youden ruggedness pro- cedure (6), to determine ruggedness of a candidate method. (Ruggedness can be determined in a single laboratory. Robustness is demonstrated in a collab- orative study.) Method instruction clarity could be determined using an estab- lished review procedure. Interlaboratory reproducibility is the only parameter that requires collaborators. The obvious question to ask when assessing the traditional collabora- tive study design is: Are 8 valid data sets really required? Clearly, 10 valid data sets are better than eight, and 12 better than 10, but how many valid data sets are really needed to satisfy the purposes of a collaborative study to quantify “reproducibility.” It is mainly a question of the confidence associ-

far exceeds the eight laboratory minimum. In these admittedly rare and rarer cases, the estimate of the reproducibility is known with much greater confidence, and this could be reported to potential users. There is a new benefit to the fitness- for-purpose model

P roficiency testing (PT) is a widely recognized practice for monitoring analytical performance, and in some ways the PT process is very similar to the process of a collaborative study.

22

L A B O R A T O R Y M A N A G E M E N T ■

© A O A C I N T E R N A T I O N A L ■

N O V E M B E R / D E C E M B E R 2 0 1 4

Made with