Does Vertebral Endplate Morphology Influence Outcomes in Lumbar Disc Arthroplasty? Part I: An Initial Assessment of a Novel Classification System of Lumbar Endplate Morphology =============================================================================================================================================================================== * James J. Yue * Matthew E. Oetgen * Jorge J. Jaramillo-de la Torre * Rudolf Bertagnoli ## Abstract **Background** The influence of lumbar endplate morphology on the clinical and radiographic outcomes of lumbar disc arthroplasty has not been evaluated to the best of our knowledge. **Study Design and Objective** In this observational study of 80 patients, the objective was to formulate a reproducible and valid lumbar endplate classification system to be used in evaluating lumbar total disc replacement patients. **Methods** A novel vertebral endplate morphology classification system was formulated after review of data related to 80 patients enrolled in a prospective, randomized clinical trial in conjunction with an application for a US Food and Drug Administration investigational device exemption. Intraobserver and interobserver analyses of the classification system were performed on the same 80 patients utilizing the classification system. **Results** The initial review of the radiographs revealed 5 types of endplates: Type I (n = 82) flat endplates; Type II (n = 26) posterior lip; Type III (n = 5) central concavity; Type IV (n = 4) anterior sloping endplate; and Type V (n = 2) combination of Types I—IV. The intraobserver kappa was 0.66 and the interobserver kappa was 0.51. These kappa values indicate “substantial” to “moderate” reproducibility, respectively. **Conclusions** In this study, we propose a lumbar endplate classification system to be used in the preoperative assessment of patients undergoing lumbar disc arthroplasty. The classification can function as a basis for comparison and discussion among arthroplasty clinicians, and serve as a possible exclusionary screening tool for disc arthroplasty. Special consideration should be given to Type II endplates to optimize proper positioning and functioning of a total disc replacement (TDR) implant. Further outcome studies are warranted to assess the clinical significance of this classification system. The key points of our study are: (1) We present a novel lumbar vertebral endplate classification system; (2) Five types of endplates were identified and classified; (3) Intraobserver and interobserver reliability were classified as substantial and moderate, respectively; and (4) The classification system used may assist in the preoperative evaluation of patients for total disc replacement. **Level of Evidence** A systematic review of cohort studies (level 2a). * Disc replacement * lumbar * endplate * morphology ## INTRODUCTION Numerous studies have investigated total disc replacement (TDR) as a treatment alternative for severe lumbar discogenic low-back pain secondary to various stages of lumbar spondylosis.1–14 These short- and medium-term outcome studies have been encouraging as to the efficacy of this procedure; however, poor outcomes due to continued pain and disability do occur. Analyses of the results of some studies of TDR have elucidated important factors predicting outcome in patients undergoing this procedure. Predictive factors that have been examined include patient disease, intraoperative variables of implant positioning, intervertebral angle, disc height, and the implanted components.15, 16 The effect of vertebral endplate morphology has not been evaluated as a possible factor in the outcomes of TDR procedures. An initial step in evaluating whether or not endplate morphology has an effect on TDR outcomes is the identification of a classification system that would permit such an evaluation. To the best of our knowledge, a lumbar vertebral endplate classification system does not exist, either in general terms or in terms of lumbar TDR. We therefore undertook the first part of this two-part study to identify a simple and reproducible endplate classification system that would serve as a basis for comparison and discussion among arthroplasty clinicians and enhance the preoperative evaluation of patients being assessed for lumbar TDR. We present a novel radiographic classification of vertebral endplate morphology for evaluating patients preoperatively for TDR, and we investigate the reliability of this classification. ## MATERIALS AND METHODS After receiving institutional review board approval, we conducted a retrospective radiographic review of 80 consecutive patients (119 disc levels) who underwent total disc arthroplasty between November 2002 and February 2005. These patients were part of a prospective clinical and radiographic outcome analysis for an FDA investigational device exemption for the ProDisc-L (Synthes, West Chester, Pennsylvania) total disc prosthesis. All patients underwent single- or bi-segmental total disc replacement utilizing the ProDisc-L prosthesis by a single, experienced orthopaedic arthroplasty surgeon. All patients underwent extensive preoperative evaluation with standing lumbar anteroposterior, lateral, side bending, and flexion-extension radiographs. In addition, all patients underwent magnetic resonance imaging and computed tomography scanning. ## CLASSIFICATION FORMULATION Prior to the initiation of this study, a novel vertebral endplate morphology classification system was developed by the two senior author spine surgeons (J.Y. and R.B.) based on preoperative, standing, neutral lateral lumbar radiographs (Figure 1). ![Figure 1](http://ijssurgery.com//https://www.ijssurgery.com/content/ijss/2/1/16/F1.medium.gif) [Figure 1](http://ijssurgery.com//content/2/1/16/F1) Figure 1 Five types of lumbar endplates: Type I - Flat endplate; Type II - Posterior hooked endplate; Type III - Concave endplate; Type IV - Convex endplate; Type V - Combined endplates. Type I is defined as flat, with parallel superior and inferior vertebral endplates. Type II is defined as hooked, with a posterior endplate concavity adjacent to a posterior endplate extension below the neutral level of the anterior aspect of the endplate. Type III is defined as concave with either superior or inferior endplate concavity greater than 10% of the vertebral body height. Type IV is defined as convex with anterior endplate convexity greater than 20% of the total endplate length. Type V is defined as combined, with any combination of findings of Type II-IV. By definition, any of these findings may be seen on either the superior and/or inferior endplate, and as such, each endplate was examined at each level. Each disc level undergoing replacement was evaluated independently. ## EVALUATION OF CLASSIFICATION The interobserver and intraobserver reliability of this classification system was evaluated in 80 patients (119 levels). Three observers evaluated the radiographs and classified the disc levels according to the investigational classification. One observer was an experienced orthopaedic spine surgeon with extensive TDR operative experience (Observer 1). He had preformed the TDRs on the patients in this study, and he had helped develop the classification system being evaluated. The second observer was a spine surgery fellow (Observer 2), and the third observer was a senior orthopaedic surgery resident (Observer 3). Each observer reviewed all radiographs independently at 3 different sessions. The sessions were a week apart, and the order in which the radiographs were reviewed was varied at each session. All patient information on each radiograph was blinded except for an indication as to the level(s) to be evaluated. Prior to the evaluation portion of this study, each observer was given a thorough explanation of the classification system. At each review session the observers were provided a diagram of the classification system and a goniometer and encouraged to use these tools in their reviews. ## STATISTICAL ANALYSIS Statistical analysis was performed using the SPSS 15.0 (Chicago, Illinois) computer software program. We first calculated the percent agreement between each observation to examine the overall prevalence and observer variability. Interobserver and intraobserver reliability were tested in a stepwise fashion for each pair of observations between observers (interobserver) and within observers (intraobserver), and the kappa value was calculated to determine the agreement between observations. Statistically significant agreement beyond chance alone was defined as a *P* value < .05. Kappa (κ) values were averaged within each group to determine the overall levels of interobserver and intraobserver agreement. The strength of agreement was evaluated in terms of the criteria established by Landis and Koch.17 Accordingly, κ values of 0.00 to 0.20 indicated slight reliability; 0.21 to 0.40, fair reliability; 0.41 to 0.60, moderate reliability; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, excellent or almost perfect agreement. As the clinical impact of this novel classification system has yet to be determined, we used an unweighted kappa statistic and assumed the classes to be nominal categories (unranked categories).18 ## RESULTS Eighty patients underwent TDR during the study period and were evaluated. Forty-one patients had single-level procedures and 39 patients had multilevel procedures. Table 1 lists the frequency of operated levels. View this table: [Table 1](http://ijssurgery.com//content/2/1/16/T1) Table 1 Distribution of Operated Levels ## INTEROBSERVER AGREEMENT Overall there was a moderate amount of agreement in the and Type III between Observer 1 and the other 2 observers; Observer 1 more frequently graded levels as Type II, while the other 2 observers more frequently graded levels as Type III (Table 2). While this data does provide information on the overall frequency of grade assignment, it does not provide any information on the agreement between the observers. There is no indication from this data if the levels assigned Type II by Observer 1 were the same levels assigned Type III by the other observers. View this table: [Table 2](http://ijssurgery.com//content/2/1/16/T2) Table 2 Summated Ratings for Each Observer In order to investigate the agreement between observers, the kappa values were determined between the observers. The kappa values for each pair of observers (κ1-2= 1 and 2, κ 1-3 = 1 and 3, and κ 2-3 = 2 and 3) are shown in Tables 3 to 5. Overall, the agreement between the observers was found to be moderate, with an average kappa value of 0.45 (κ1-2 = 0.42, κ1-3 = 0.41, κ 2-3 = 0.51). The kappa value was found to be significant for all calculations, indicating a greater than 5% possibility that the amount of agreement between observers was by chance alone. View this table: [Table 3](http://ijssurgery.com//content/2/1/16/T3) Table 3 Interobserver Agreement Observer 1 vs. Observer 2 View this table: [Table 4](http://ijssurgery.com//content/2/1/16/T4) Table 4 Interobserver Agreement Observer 1 vs. Observer 3 View this table: [Table 5](http://ijssurgery.com//content/2/1/16/T5) Table 5 Interobserver Agreement Observer 2 vs. Observer 3 The agreement between the 2 less experienced observers (Observers 2 and 3) was found to be higher (κ 2-3 = 0.51) than the agreement between either of these observers and the more experienced observer (Observer 1). According to the strength classification of levels in the most extreme conditions—Type I parallel endplates and Type V combination endplates— among the 3 observers. There was a noticeable difference in the frequency with which levels were classified as Type II of agreement criteria of Landis and Koch, the inexperienced observers showed moderate agreement, while both of the inexperienced observers only showed fair agreement with the experienced observer. ## INTRAOBSERVER AGREEMENT The intraobserver agreement was calculated by comparing each observer's results for each of their three evaluations of the data. The results are shown in Table 6. The intraobserver agreement was quite good for each of the observers, ranging from 76-90%. This amount of agreement was associated with kappa values in the substantial agreement range (κ 1 = 0.69, κ 2 = 0.66, κ 3 = 0.65). All of these values were statistically significant *(P* < .05). View this table: [Table 6](http://ijssurgery.com//content/2/1/16/T6) Table 6 Interobserver Agreement Observer 2 vs. Observer 3 ## DISCUSSION Many orthopaedic classifications have been developed that are based on the radiographic morphology of the structure of interest.19–21 In disc arthroplasty this structure is the intervertebral disc space, and more specifically, the vertebral endplates. The morphology of vertebral endplates has been evaluated in the past to determine the correlation of the appearance of the endplates with disc herniation and the prevalence of low-back pain.22, 23 Results have been mixed, but a common finding in these studies has been the extent of the variation in vertebral endplate morphology that exists in the population. It is reasonable to consider the morphology of the vertebral endplate prior to TDR, as this is the structure on which the device is implanted. Morphologic differences in vertebral endplates do exist and have the potential to influence the positioning of TDR components; therefore, it may be important to classify these morphologies to assist in patient selection and surgical planning. The classification of disease processes is quite useful. A reliable classification system allows physicians to properly characterize the problem, which eventually can guide treatment in a systematic manner. A classification system can also establish expected outcomes, which offers the ability to compare cases and treatments across groups of patients and physicians. The relatively new field of total disc arthroplasty would benefit greatly from a classification system that could assist in treatment guidance and offer some standard by which patients could be compared, despite the variety of implants currently used. We have developed a novel classification system for patients undergoing lumbar total disc arthroplasty based on the preoperative lateral radiograph. This classification system was designed based on the observations of the two senior authors who have evaluated numerous patients for total disc arthroplasty. The criteria for this system have been defined in order to be as sensitive as possible, so as to identify all vertebral levels with morphologic anomalies. All patients classified as Type I have parallel endplates. Patients in Type II have endplates with a small posterior concavity followed by an endplate extension below the level of the anterior endplate. This posterior extension can inhibit the proper posterior positioning of the implant if not recognized and contoured appropriately (Figure 2). Type III endplates consist of either a superior or inferior endplate with greater than 10% concavity as compared to the total vertebral body height. Previous cadaveric studies have estimated the height of the normal lumbar vertebral body to be roughly 30 mm.24 A 10% concavity would correlate with a 3 mm void in the central region of the endplate, which is the area in which the implant's central keel is placed to gain implant-bone fixation and ultimate bony ingrowth. ![Figure 2](http://ijssurgery.com//https://www.ijssurgery.com/content/ijss/2/1/16/F2.medium.gif) [Figure 2](http://ijssurgery.com//content/2/1/16/F2) Figure 2 A) Preoperative Type II endplate. (B) Postoperative Type II endplate with anterior positioning of superior L5 endplate due to posterior L5 hook anatomy. The keel on the ProDisc-L is 6.5 mm in height. With these dimensions, a Type III endplate would lack approximately 50% surface area for endplate fixation and eventual bony ingrowth. Type IV endplates have an anterior convexity of greater than 20% vertebral body depth. The average vertebral body depth has been estimated to be approximately 35 mm.24 Therefore, this type of endplate would offer approximately 28 mm of flat endplate to secure device fixation. The ProDisc-L is available in 2 sizes with depth measurements of 30 and 27 mm. Endplates with more than 20% anterior convexity may have an insufficient area on which to implant a prostheses (Figure 3). Type V endplates consist of any combination of Type II-IV morphologic changes. ![Figure 3](http://ijssurgery.com//https://www.ijssurgery.com/content/ijss/2/1/16/F3.medium.gif) [Figure 3](http://ijssurgery.com//content/2/1/16/F3) Figure 3 (A) Preoperative Type IV endplate. (B) Type IV endplate with convex S1 endplate and subsequent lack of endplate coverage. We have shown this classification system to be reliable and reproducible, with kappa values between observers in the moderate range (κ = 0.34-0.51) and kappa values within endplates based solely on the measurements obtained in their review. This bias is a reasonable explanation for the difference in the kappa values between the observers. This explanation is further supported in reviewing the summated ratings from each observer (Table 2). Type II has been suggested by the classification developers to be quite important to identify because the posterior endplate extension, if not identified and addressed at the time of surgery, can lead to great difficulty in properly positioning the implant. Type II is radiographically similar to Type III. Operative experience would likely bias an observer to identify more Type II endplates, as these would be more important to identify preoperatively so proper planning could be carried out. We noticed such bias as shown in Table 2. Observer 1 consistently identified more Type II endplates, whereas the less experienced observers consistently identified more Type III endplates. Despite some variation in the interobserver reliability, we found a high degree of reliability in the intraobserver calculations for our classification, regardless of clinical experience. Kappa values were found to be 0.65 to 0.69, which is considered substantial agreement. The high level of agreement for this classification is likely due to the relative simplicity of the type definitions. This allows for more consistent analysis of the endplates individual observers in the substantial range (κ = 0.65-0.69). The results of our analysis are comparable to previously reported orthopaedic classifications that are commonly used. The Neer Classification for proximal humerus fractures has been widely used and analyzed. Sidor et al. reported intraobserver kappa values of 0.5-0.83 depending on the expertise and experience of the observer and interobserver values of 0.43-0.58.21 Spine classifications have shown similar results. Cummings et al. analyzed the King classification of scoliosis and showed intraobserver kappa values of 0.44 to 0.72, with an interobserver value of 0.44.20 Finally, Grauer et al. recently offered a modification of the Anderson and D'Alonzo classification for odontoid fractures with intraobserver and interobserver kappa values of 0.41-0.6 and 0.48, respectively.19 We found a difference in the interobserver reliability with the experienced observer compared to the less experienced observers, but found no difference in the reliability between the 2 less experienced observers. This difference is likely due to bias from the more experienced observer. The experienced observer (who helped develop the classification) was likely influenced by his baseline knowledge in assessing patients for total disc arthroplasty and was less likely to grade endplates strictly according to measurements obtained in the review. The less experienced observers were more likely to grade the with objective standards, rather then relying on subjective assessment with each observation. A substantial amount of agreement was even seen for the most experienced surgeon, despite suggestions of subjective bias in his observations in the interobserver analysis. The intraobserver reliability is most important in classifications used for preoperative assessment of patients. A high intraobserver value allows surgeons to make consistent treatment decisions, which likely are more important for patient outcome than the interobserver reliability of a classification. Like all classifications and their analysis, ours has its limitations. We did find evidence of observational bias by the most experienced physician in our study. Despite this limitation, the experienced physician had a high degree of intraobserver reliability in his measurements, suggesting this classification system is reliable even when exposed to subjective assessment. A potential limitation of our classification is the analysis of vertebral endplate morphology (a 3-dimensional structure) based on a single 2-dimensional radiograph. Although this is not ideal, we believe that if we could demonstrate the validity of our classification system using plain radiographs, then the use of additional, more advanced imaging such as sagittal CT reconstructions would only add to the accuracy and clinical usefulness of the system. Our classification system is simple and based on a standard radiograph that is routinely obtained in all chronic spinal conditions. Finally, our classification may be limited by its underlying definitions of the different types. We have attempted to define our classification based on findings that may impact clinical decision making or surgical planning. We have based our measurements on reported averages of vertebral body measurements. Certainly variations exist in the dimensions of vertebral bodies, and as discussed above, we have tried to make our class definitions in such a way to include any subjects with potential clinically significant findings. This may predispose our classification to interpret radiographic findings as more clinically important then they may be. In classification systems used to assess patients for surgical procedures, we believe it is important to err on the side of increased sensitivity. This allows for the most thorough preoperative planning and ensures that potential clinically significant variations in anatomy are not missed. We have introduced a classification for lumbar vertebral endplate morphology which may be useful in preoperative patient assessment prior to total disc arthroplasty. This classification appears to be reproducible and reliable across a range of clinical experience levels. Use of this classification system will be helpful in the preoperative planning for patients undergoing TDR, especially in preparing for the critically important endplate contouring aspect of the procedure. The information provided with this classification may eventually be useful in patient selection for total disc arthroplasty, as well as providing a rationale for implant selection, based on different base plate fixation designs and implant size options. Further prospective analysis is needed to validate this system's clinical usefulness and its potential to predict patient outcome. ## Footnotes * Protocol approved by Yale Human Investigation Committee. * Received October 14, 2007. * Accepted January 7, 2008. * Copyright SAS - Spine Arthroplasty Society 2008 This is an Open Access article distributed under the terms of the Creative Commons Attribution-Noncommercial 3.0 Unported License, permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. ## REFERENCES 1. Siepe CJ, Wiechert K, Khattab MF, Korge A, Mayer HM (2007) Total lumbar disc replacement in athletes: clinical results, return to sport and athletic performance. Eur Spine J. 11(Supplement 2):S131–S136. 2. Siepe CJ, Mayer HM, Wiechert K, Korge A (2006) Clinical results of total lumbar disc replacement with ProDisc II: three-year results for different indications. Spine. 31(17):1923–1932. 3. Regan JJ, McAfee PC, Blumenthal SL, et al. (2006) Evaluation of surgical volume and the early experience with lumbar total disc replacement as part of the investigational device exemption study of the Charite Artificial Disc. Spine. 31(19):2270–2276. 4. Putzier M, Funk JF, Schneider SV, et al. (2006) Charite total disc replacement-clinical and radiographical results after an average follow-up of 17 years. Eur Spine J. 15(2):183–195. 5. Leivseth G, Braaten S, Frobin W, Brinckmann P (2006) Mobility of lumbar segments instrumented with a ProDisc II prosthesis: a two-year follow-up study. Spine. 31(15):1726–1733. 6. Huang RC, Tropiano P, Marnay T, Girardi FP, Lim MR, Cammisa FP Jr. (2006) Range of motion and adjacent level degeneration after lumbar total disc replacement. Spine J. 6(3):242–247. 7. Herkowitz HN (2006) Total disc replacement with the CHARITE artificial disc was as effective as lumbar interbody fusion. J Bone Joint Surg Am. 88(5):1168. 8. Freeman BJ, Davenport J (Aug, 2006) Total disc replacement in the lumbar spine: a systematic review of the literature. Eur Spine J. 15(Supplement 15):439–447. 9. Chung SS, Lee CS, Kang CS (2006) Lumbar total disc replacement using ProDisc II: a prospective study with a 2-year minimum follow-up. J Spinal Disord Tech. 19(6):411–415. 10. Bertagnoli R, Yue JJ, Kershaw T, et al. (20, 2006) Lumbar total disc arthroplasty utilizing the ProDisc prosthesis in smokers versus nonsmokers: a prospective study with 2-year minimum follow-up. Spine. 31(9):992–997. 11. Bertagnoli R, Yue JJ, Fenk-Mayer A, Eerulkar J, Emerson JW (2006) Treatment of symptomatic adjacent-segment degeneration after lumbar fusion with total disc arthroplasty by using the prodisc prosthesis: a prospective study with 2-year minimum follow up. J Neurosurg Spine. 4(2):91–97. 12. Huang RC, Girardi FP, Cammisa FP Jr., Lim MR, Tropiano P, Marnay T (2005) Correlation between range of motion and outcome after lumbar total disc replacement: 8.6-year follow-up. Spine. 30(12):1407–1411. 13. Shuff C, An HS (2005) Artificial disc replacement: the new solution for discogenic low back pain? Am J Orthop. 34(1):8–12. 14. Shedid D, Ugokwe KT, Benzel EC (2005) Lumbar total disc replacement compared with spinal fusion: treatment choice and evaluation of outcome. Nat Clin Pract Neurol. 1(1):4–5. 15. McAfee PC, Cunningham B, Holsapple G (2005) A prospective, randomized, multicenter Food and Drug Administration investigational device exemption study of lumbar total disc replacement with the CHARITE artificial disc versus lumbar fusion: part II: evaluation of radiographic outcomes and correlation of surgical technique accuracy with clinical outcomes. Spine 30(14):1576–1583, discussion E1388-1590. 16. Bertagnoli R, Kumar S (2002) Indications for full prosthetic disc arthroplasty: a correlation of clinical outcome against a variety of indications. European Spine Journal. 11(Suppl 2):S131–136. 17. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics. 33(1):159–174. 18. Garbuz DS, Masri BA, Esdaile J, Duncan CP (2002) Classification systems in orthopaedics. Journal of the American Academy of Orthopaedic Surgeons 10(4):290–297. 19. Grauer JN, Shafi B, Hilibrand AS, et al. (2005) Proposal of a modified, treatment-oriented classification of odontoid fractures.[see comment]. Spine Journal: Official Journal of the North American Spine Society. 5(2):123–129. 20. Cummings RJ, Loveless EA, Campbell J, Samelson S, Mazur JM (1998) Interobserver reliability and intraobserver reproducibility of the system of King et al. for the classification of adolescent idiopathic scoliosis. [see comment]. Journal of Bone & Joint Surgery - American Volume. 80(8):1107–1111. 21. Sidor ML, Zuckerman JD, Lyon T, Koval K, Cuomo F, Schoenberg N (1993) The Neer classification system for proximal humeral fractures. An assessment of interobserver reliability and intraobserver reproducibility. Journal of Bone & Joint Surgery - American Volume. 75(12):1745–1750. 22. Harrington J Jr., Sungarian A, Rogg J, Makker VJ, Epstein MH (2001) The relation between vertebral endplate shape and lumbar disc herniations.[see comment]. Spine. 26(19):2133–2138. 23. Moorman CT 3rd., Johnson DC, Pavlov H, et al. (2004) Hyperconcavity of the lumbar vertebral endplates in the elite football lineman. American Journal of Sports Medicine. 32(6):1434–1439. 24. McLain RF, Yerby SA, Moseley TA (2002) Comparative morphometry of L4 vertebrae: comparison of large animal models for the human lumbar spine. Spine. 27(8):E200–206.