Introduction

Lumbar degenerative disc disease (LDDD) is a major cause of chronic low-back pain with lumbar segmental instability, in which surgical intervention is required when conservative treatment fails. Lumbar fusion has been developed for several decades as the gold standard for treatment of symptomatic LDDD. A mechanically stable fusion of the involved lumbar segments should reduce pain [1, 2]. However, this method is not perfect, not only because of complications, but also because of increasing stress on the adjacent segments may cause new instability and pain [35]. A variety of surgical fusion techniques have been performed during the last century; radiographically confirmed fusion rates have exceeded 96.7 % [6]. However, all of this has not translated into an improvement of successful clinical outcomes [7, 8]. This seems to indicate that fusion was not the parameter which determines the clinical success. In recent years, the idea of the nonfusion treatment has been gradually accepted by spine surgeons and the patients, and used in clinical practice widely.

Artificial total disc replacement (TDR) as an alternative to spinal fusion has been increasingly applied for surgically treating LDDD [9, 10]. It was postulated that the patient’s normal intervertebral segment motion might be restored and maintained while the adjacent level was prevented from nonphysiologic loading and thus the pain was relieved [1113]. Previous studies that compared the clinical effects of TDR to fusion for treating LDDD provided ambiguous results [1416]. Therefore, it was still uncertain whether TDR was more effective and safer than fusion. The objective of this study was to systematically compare the efficacy and safety of TDR to fusion for the treatment of LDDD.

Materials and methods

Study selection

All randomized controlled trials comparing TDR with fusion for the treatment of LDDD were identified. We searched databases including PubMedCentral, MEDLINE (from 1966), EMBASE(from 1980), BIOSIS (from 2004), ClinicalTrials.gov, and FDA trials register. The following sources were searched up to 30 January 2013. The search strategy consisted of a combination of keywords concerning the technical procedure (lumbar degenerative disc disease, lumbar disc replacement, artificial total disc replacement, lumbar arthroplasty, lumbar fusion, prosthesis, implantation, and randomized controlled trial) and keywords regarding the anatomical features and pathology (lumbar vertebrae). These keywords were used as MESH headings and free text words. In addition, a search was performed using the specific names of the prostheses. In this systematic review and meta-analysis, all of relevant review articles and other potentially eligible studies were randomized controlled trials (RCTs), the search was limited to studies published in English. Studies for the review were randomized controlled trials, published in a peer-reviewed journal as a full article, excluding grey literature and conference proceedings.

Two authors (W.J.B. and S.Y.M.) checked all titles and abstracts from the databases independently which met our search terms and reviewed full publications, excepted from this, an authors (L.S.) as a referee independently selected the articles from the list of identified references whenever necessary. The reference section of all primary studies was inspected for additional references, and only those reporting the results of a randomized controlled trial (RCTs) were included in this analysis. This review was conducted under the suggested QUORUM guideline standards [17]. If studies did not report the actual number or the standard deviation, but rather presented the data only in graph format, the authors were contacted. Most authors responded but were not able to provide additional clarification because of personal circumstances or because the data presented were preliminary and not available for scientific research. The main characteristics of the studies included in the meta-analysis are presented in Table 1.

Table 1 Main characteristics of studies included in the meta-analysis

Data extraction

Two authors (W.J.B. and S.Y.M.) independently extracted relevant data from the included studies regarding design, age, gender, type of disc prosthesis, type of fusion intervention, and follow-up period. The outcomes pooled in this analysis include visual analogue scale (VAS), Oswestry disability index (ODI), intra-operative blood loss, operating time, proportion of full-time and part-time work, complication, reoperation rate, and the range of motion (ROM).

Assessment of heterogeneity

The clinical homogeneity related to characteristics of the participants, including age, sex, clinical manifestation, pain, function status baseline, and so on. Surgical technique homogeneity compared the type of artificial lumbar disc and fusion method, follow-up period, and measurement method. The chi-squared test was performed to identify the heterogeneity, which was directly calculated from the Q statistic and describes the percentage of variation across the studies that was due to heterogeneity rather than change.I 2 ranges from 0 to 100 %, with 0 indicating the absence of any heterogeneity. When I 2 was <50 %, low heterogeneity was assumed, and the effect was thought to be due to change. Conversely, when I 2 > 50 %, heterogeneity was thought to exist and the effect was random.

Assessment of risk of bias

Inadequate methodology in randomized controlled trials leads to the risk of intervention effects overestimation [18], so assessment of risk of bias in included studies was necessary. The assessment of bias risk of the included studies is presented in Table 2. Controlled trials were assessed using a criteria list recommended by the Cochrane Back Review Group [19]. The following criteria were scored yes, no, or unsure: criteria 12 was scored not applicable because we consider compliance no relevant for surgical interventions. If studies met at least 6 of the 12 items, it was considered low.

Table 2 Criteria for risk of bias assessment

Statistical analysis

Attempts were made to statistically pool the data of homogeneous studies in order to obtain the primary and the secondary outcomes. According to the recommendations of the Cochrane Collaboration, the meta-analyses were using RevMan software (version5.0) provided by the organization. The results were expressed in terms of odds ratio (OR) and a 95 % confidence interval (95 % CI) for dichotomous outcomes, and in terms of mean difference (MD) and 95 % CI for continuous outcomes. When the same continuous outcomes were measured in different scales, standardized mean difference (SMD) and 95 % CI were calculated. If in some studies outcomes were shown as dichotomous data while in the other studies expressed as continuous data, odds ratios would be repressed as standardized mean difference to allow dichotomous and continuous data to be pooled together. We performed a sensitivity analysis for the measured effects omitting the study which may largely influence the clinical results. If when I 2 > 50 %, only the randomized effect model was accurate.

Results

The process of identifying relevant studies is summarized in Table 1. From the selected databases, 382 references were obtained. By screening the titles and abstracts, 366 references were excluded due to the irrelevance to this topic. In 16 potentially relevant references, 10 references were omitted according to conditions listed in Fig. 1. The remaining 16 reports were taken for a comprehensive evaluation. These reports were based upon six independent continuous clinical randomized fusion trials, reporting for different follow-up periods or containing separated results. The six RCTs with the relevant information were eventually included involving 1,603 patients [2025]. All of these studies reported two year follow-up results.

Fig. 1
figure 1

Flowchart showing the identification, inclusion, and exclusion of the randomized controlled trials (RCTs)

Description

The characteristics of six included studies are summarized in Table 1. In all included studies, the patients with symptomatic LDDD were recruited with sample size ranging from 78 to 577 patients. Berg et al. [20]. performed artificial disc replacement with one of three following devices: CHARITE’, ProDisc-L, or Maverick (Medtronic, Memphis, TN) compared with the posterolateral fusion (PLF) with autologous bone graft or posterior interbody fusion (PLIF) with two carbon fiber cages. Blumethal et al. [21]. performed the CHARITE’ artificial disc (DePuy Spine, Raynham, MA) replacement compared with anterior lumbar interbody fusion (ALIF) with BAK cages. In two studies, the ProDisc-L (Synthes Spine, West Chester, PA) [22, 23], artificial disc were employed compared with circumferential fusion. Delamarter et al. [24]. reported the prosthesis consists of two end plates manufactured from cobalt-chromium-molybdenum (CoCrMo) alloy and a convex ultra-high molecular weight polyethylene (UHMWPE) insert. Fixation was provided by a central keel and titanium plasma spray coating on each end plate. In addition, Gornet et al. [25]. performed the MAVERICK Disc (Medtronic, Memphis, TN) replacement compared with anterior interbody fusion with rhBMP-2 on an absorbable collagen sponge (INFUSE Bone Graft, Medtronic) and tapered fusion cages (LT-CAGE Lumbar Tapered Fusion Device, Medtronic).

Risk of bias

The outcomes are presented in Table 3. A fixed blocking method of randomization with six assignments per block was described in five studies [2024]. Sealed envelop technique for allocation concealment was applied in three studies [20, 22, 23]. In three studies, the participants remained blinded until the operation was finished. All of the participants in the six studies had performed the follow-up for at two years and a follow-up rate of more than 89 % was obtained in four of these studies [2023]. None of the included studies encompassed the information of intention-to-treat (ITT) analysis. Overall, there trials were classified as more high methodological quality.

Table 3 Risk of bias assessment of included studies

Visual analogue scale. (VAS)

The scales measuring the duration and intensity of pain have scores ranging from 0 to 10, with a lower score representing a better condition. The composite back or/and leg pain score was derived by multiplying the intensity and duration scores. By this method, the composite score could range from 0 to 100. At two years, VAS pain scores recorded for back or/and leg pain indicated statistically significant improvement from preoperative levels regardless of treatment. Only five trials reported the continuous outcome measures in the form of mean SD, and so they were included in the meta-analysis [2024]. They enrolled 1,603 patients, with 1,081 patients assigned to the TDR group and 522 patients assigned to the fusion group. As for the outcome of VAS scores, the test for heterogeneity demonstrated that no significant heterogeneity existed across the five studies (P = 0.80; I 2 = 0 %), and the fixed model was performed. Compared with patients treated with fusion, patients treated with artificial TDR showed a significant decrease (SMD, −3.18; CI, −5.74 to −0.63; P = 0.01) (Figs. 2 and 3). At 2-year follow-up, the patient functioning ability measured by VAS in the TDR group was better than that of the fusion group with statistical significance.

Fig. 2
figure 2

Results of the meta-analysis for the visual analogue scale (VAS) for TDR and fusion groups at 2-year

Fig. 3
figure 3

Results of the meta-analysis for the Owestry disability index (ODI) for TDR and fusion groups at 2-year

Oswestry disability index(ODI)

The ODI was a validated questionnaire that assesses a patient’s disability during activities of daily living. ODI success was defined as 15 % improvement from baseline. Regardless of treatment, all patients showed statistically significant improvement in ODI scores at two year compared with baseline. Six trials reported the continuous outcome measures as mean standard deviation(SD), and they were included in the meta-analysis. Six trials enrolled 1,603 patients, with 1,081 patients assigned to the TDR group and 522 patients assigned to the fusion group. The test for heterogeneity demonstrated significant heterogeneity existed across the six studies (P = 0.88; I 2 = 0 %), and the random model was performed. The outcome shown low heterogeneity. Overall, a two year follow-up, the patient functioning ability measured by ODI in the TDR group (SMD, −5.13; CI, −7.35 to −2.90; P < 0.0001) was better than that of the fusion group with statistical significance.

Intraoperative blood loss and operating time

Both the intra-operative blood loss and operating time were reported in five trials [20, 21, 2325]. Five trials enrolled 1,525 patients with 1,025 patients assigned to the TDR group and the other 500 patients assigned to the fusion group. The test for heterogeneity of two groups demonstrated existed significant heterogeneity(I 2 = 95 %; I 2 = 98 %; respectively). The greater heterogeneity came from surgery approaches different in fusion group, therefore this trails divided into two subgroup (e.g., posterior and anterior group) separately based on surgery approaches different when meta-analysis, and the randomized effect model was performed. Overall, patients treated with TDR showed no significant difference no matter which methods was used compared to patients treated with fusion in the intra-operative blood loss (SMD, −92.39; CI, −309.05–124.27; P = 0.40 or SMD,70.53; CI,−75.87–216.94; P = 0.35 respectively) (Figs. 4 and 5). Meantime, in operating time, there were significant difference in anterior group (SMD, −81.16; CI, −143.60–18.71; P = 0.01)(Fig. 6), while in posterior group, that between TDR and fusion were no difference(SMD,12.49; CI, −13.85–38.83; P = 0.35) (Fig. 7).

Figs. 4, 5
figure 4

Results of the meta-analysis for intraoperative blood loss for TDR and fusion subgroups at 2-year

Figs. 6, 7
figure 5

Results of the meta-analysis for operating time for TDR and fusion subgroups at 2-year

Proportion of full-time and part-time work

There were five trails [20, 21, 2325] reports proportion of full-time and part-time work. Its enrolled 1,525 patients, with 1,025 patients assigned to the TDR group and the other 500 patients assigned to the fusion group. The rate was 72.7 % (745 of 1025) in the TDR group and 70.2 % (351 of 500) in the fusion group. The test for heterogeneity demonstrated that no significant heterogeneity existed across the five studies (P = 0.62; I 2 = 0 %), and the fixed model was performed. Patients treated with TDR showed a significant decrease (OR 1.14; 95 % CI [0.89–1.44]; P = 0.30) compared to patients treated with fusion (Fig. 6). Overall, the patient functioning ability measured by the proportion of full-time and part-time work in the TDR group was equivalent to the fusion group follow-up two years.

Complication

Complication including device failures necessitating reoperation, revision, removal occurred, major vessel injury, neurologic damage, nerve root injury, death and so on in the study. The complication rate was reported in five trials [2125]. Five trials enrolled 1,525 patients, with 1,025 patients assigned to the TDR group and the other 500 patients assigned to the fusion group. The complication rate was 5.8 % (59 of 1025) in the TDR group and 10.8 % (54 of 500) in the fusion group. The test for heterogeneity of two studies demonstrated existed significant heterogeneity (P = 0.03; I 2 = 63 %), and the randomized effect model was performed. Patients treated with TDR showed no significant decrease in the overall reoperation rate (OR 0.57; 95 % CI [0.38–0.84]; P = 0.005) compared with patients treated with fusion (Fig. 7). Overall, a two year follow-up, the patient functioning ability measured by complication in the TDR group was better than that of the fusion group with statistical significance.

Overall reoperation rate

Secondary surgical procedures were defined as any revision, removal, or reoperation of the implant or supplemental fixation. The overall reoperation rate was reported in five trials [20, 2225]. Five trials enrolled 1,525 patients, with 1,025 patients assigned to the TDR group and 500 patients assigned to the fusion group. The overall reoperation rate was 5.2 % (53 of 1025) in the TDR group and 6 % (30 of 500) in the fusion group. The test for heterogeneity of two studies demonstrated existed significant heterogeneity (P = 0.07; I 2 = 51 %), and the randomized effect model was performed. Patients treated with TDR showed no different in the overall reoperation rate (OR, 0.91; CI, 0.57 to 1.46; P = 0.71) compared with patients treated with fusion (Figs. 8, 9 and 10).

Fig. 8
figure 6

Results of the meta-analysis for proportion of full-time and part-time work for TDR and fusion groups at 2-year

Fig. 9
figure 7

Results of the meta-analysis for complication for TDR and fusion groups at 2-year

Fig. 10
figure 8

Results of the meta-analysis for reoperation rate for TDR and fusion groups at 2-year. Abbreviations: TDR, Artificial total disc replacement; CI, Confidence interval; M-H, Mantel-Haenszel; SD, standard deviation

The range of motion (ROM)

Angular motion in the sagittal plane was measured by comparing lateral flexion and extension radiographs. Because lacked of the consistent criterion to ROM of lumbar operation, the five RCTs reports lumbar ROM after treatment of two groups only made common statistical descriptive analysis. By the FDA definition, failures would include a patient with 7° ROM before surgery at the index level that maintains 7° at 24 months or a hypermobile segment returned to a normal functional ROM, and successes would include improvement from 1° ROM before surgery to a nonfunctional 2° ROM after surgery. Delamarter et al. [22] followed up patients at 6 months, and found that difference of operative segments in L4-5 were statistically significant while operative segments in L5-S1 was not. Berg et al. [20] reports the ROM of TDR better than the fusion group at operation level of L4-5 or/and L5-S1, the different of index mean were not statistically significant compared to pre- and postoperative after two years, in other words, TDR retained the normal ranges at operation level. Zigler et al. [23] found that ROM was maintained within a normal functional range in 93.7 % of TDR patients at two year follow-up (Delamarter et al. [24]) reported the ROM in the TDR group averaged 7.8° ± 5.3° at the level of the superior disc and 6.2° ± 4.1° at the level of the inferior disc at two year. Gornet et al. [25] found at the TDR group, Compared to the mean angular motion value was 7.0º before surgery, motion varied at 12 and two year were 9.4º and 9.5º respectively, demonstrating not only maintenance but also increase in segmental motion after TDR. Overall, the range of motion (ROM) was maintained within normal ranges after TDR, and demonstrated the benefits of motion preservation for the deceleration of adjacent level degeneration.

Discussion

Safety

There were five trials [20, 2225] assessing the safety from four facets, including intra-operative blood loss, operative time, surgical complications and the reoperation rate. In Berg et al. [20], both operation time and hospital stay were shorter in the TDR group than in the fusion group, but complications and reoperations were similar in both groups. Blumenthal et al. [21] reported there was no difference between the two groups with respect to operative time, blood loss and complications, but the hospital stay was significantly shorter in the TDR group. Moreover, he believed its safety and efficacy would significantly increase when performed by a skilled surgeon. On the contrary, Zigler et al. [23] showed that the TDR group was statistically significantly lower with regard to operative time, estimated blood loss, and length of hospital stay. In addition, retrograde ejaculation was reported in two patients (1.2 %) in the TDR group, and three patients developed deep venous thrombosis after surgery (two TDR and one fusion). Gornet et al. [25] found that the mean operative time was approximately 24 min longer, and a higher blood loss for the TDR group, which were a statistically significant differences, but the average lengths of hospital stay for TDR and fusion patients were similar, and overall adverse event rates showed no statistical difference. Delamarter et al. [24] reported intraoperative data showing that operative times, estimated blood loss, and length of hospital stay were significantly decreased in TDR group. Complications included one dural tear in TDR group (one of 165)and three dural tears in the fusion group (three of 72). One of the two patients in TDR group sustained an iliac artery tear, whereas the other patient in TDR group and the two patients in fusion group had excessive oozing from the decompression, decorticated bone, and graft sites. Postoperatively, deep venous thrombosis was reported in two (1.2 %) of 165 patients in TDR group and two (2.8 %) of 72 in fusion group. According to meta-analysis results, the intraoperative blood loss and operating time in the posterior group showed no significant difference compared to fusion group; apart from this, operating time in the anterior group and complications are significantly reduced compared to the fusion group. Thus, the authors draw a conclusion that safety in TDR group is better than fusion group at two years follow-up.

Efficacy

Most of the included studies showed clinically relevant improvement from VAS, ODI, proportion of full-time and part-time work, and ROM. In the study of Berg et al. [20], there were no differences in ODI success; and they were satisfied with their treatment and full or part-time between groups at one and two years. He also believed that strictly choosing surgical indications could improve the efficacy of the TDR. Blumenthal et al. [21] reported that both the improvement of ODI and VAS and employment rate had statistically significant differences favoring the TDR patients compared to the fusion group at 24 months. Overall clinical success was significantly higher in TDR group compared with fusion group. In Zigler et al. [23], except for SF-36, improvement in ODI, VAS patient satisfaction, and recreation status had increased, showing a statistically significant difference favoring the TDR patients compared to the fusion group at 24 months. Using the FDA definition, 53.4 % of TDR and 40.8 % of fusion patients were successful, with a statistically significant difference favoring the TDR group. Gornet et al. [25] observed that 24 months follow-up ODI, a mean improvement of low back pain and leg pain, and SF-36 scores after surgery were significantly higher than the values for the fusion group. In addition, the percentage of working participants in the TDR group was slightly higher than that of the fusion group. Overall, a significantly higher percentage of TDR patients showed satisfactory outcomes than fusion patients. Delamarter et al. [24] reported that the mean improvement of ODI from baseline, the SF-36 score, and VAS score for patient satisfaction showed a significant difference in favor of the TDR group at all follow-up time points. The percentage of patients participating in work and recreation status had increased in both groups at 24 months, but there was no significant difference between the TDR group and the fusion group. Wilco C. H. Jacobs et al. [26] conclude that TDR seems to be effective in treating low back pain in selected patients, and in the short term was at least equivalent to fusion surgery. Berg et al. [20].believed that a strict grasp the indications for surgery could enhance effectiveness. In addition, among VAS, ODI, proportion of full-time and part-time work, meta-analysis results showed statistically significant improvement compared to fusion group at 24 months. Therefore, it could be said that the efficacy in the TDR group better than the fusion group.

Deficiency

The trials showed TDR had significant safety and efficacy compare to fusion at short-term follow-up, but there was a lack of long-term follow-up, high-quality RCT articles, so evaluation of the efficacy and safety of TDR comparison of fusion for the treatment of LDDD was not sufficient. The current literature reported only long-term follow-up of TDR. Both medium-term follow-up (from five to ten years) [2730] and long-term follow-up (>10 years) [31, 32] showed significant improvement postoperatively in clinical, and in radiological the motion preserved in surgical segment, lower rate of adjacent segment degeneration. No special complications were noted. In addition, the authors believe that long-term follow-up should focus on potential failure of TDR and adjacent segment degenerative.

The potential failures of disc replacement include early wear, malposition, and prostheses loosening, which are dependent upon numerous factors related to implant design, surgical technique, and patient-specific factors. This needed to be revised. Followed up at two years, Blumethal et al. [21] and Berg et al. [20] reported four (5 %) and five (2.4 %) patients, respectively, requiring prostheses revision, but mentioned no surgery method. David [31] reported a mean follow-up time of 13.2 years, with eight (7.5 %) patients requiring posterior fusion. In Putzier et al. [32], 6 (11 %) of patients needed reoperation during a follow-up of 17 years. The authors believed an organized approach reduced operative time, minimizes risks, decreases stress, and increases the success rate. At early failure of surgery TDR prostheses are removed and revision to another artificial disc can be considered in order to account for renovated segments without destroying them. Conversely, once renovated sections of lumbar bone structure have been destroyed, lumbar fusion must be performed, including posterolateral fusion, 360° fusion with cages or allograft bone. On the choice of surgical approaches, Pimenta et al. [33]believed anterior revision approaches were associated with significant risks due to scarring and adhesions resulting from the primary procedure, making mobilization of the vessels very difficult, especially at the L4-5 bifurcation; thus primary revision of a failed TDR could be planned as a posterior fusion.

With increased age, lumbar disc degeneration result in lumbar disc space collapse and corresponding stiffening of spinal segments. It is a normal physiological process. There are individual differences in this process; thought to be related to the altered mechanics or loss of motion from the fusion, adjacent segment degeneration was a controversial problem that occurred after a spinal fusion. Motion preservation TDR theoretically may decrease or prevent adjacent segment disease from occurring, but it was difficult to judge postoperative adjacent disc degeneration, whether from physiological processes or TDR reasons. In some literature, follow-up of TDR postoperative more than ten years reported adjacent segment degenerative disease incidence of less than 2.8 %, although longer follow-up was not found. Despite the lack of comparability, these long-term follow-up results were encouraging.

The literature holds a positive attitude on the clinical and radiological in TDR long-term follow-up. Despite lacking of fusion group comparison, TDR was found to be at least as safe and effective as a treatment of LDDD.

Limitation

Meta-analysis is a statistical analysis of data collected from several different researches and surveys on the same problem, pooling outcomes in order to arrive at a more unbiased and scientific conclusion [34, 35]. The purpose of this study is to systematically compare the efficacy and the safety of TDR to fusion for the treatment of LDDD. In this meta-analysis, although we identified six RCTs, and five of six trails were regarded as high-quality, the results still have limited application because there exist some problems. First of all was the methodologic quality. In these trials, two studies [23, 25] had no allocation concealment. Some scholars [36] believe that allocation concealment, rather than a perfect test and a non-hidden distribution plan or distribution plan to hide imperfections test, often exaggerated treatment effect of 30−41 %. It may produce measurement bias that patient blinded in two trails [21, 23] and surgeon blinded in three trails [2123] to the intervention. In addition to this, the literature of Zigler et al. [23] used allocation concealment, patient blinded and surgeon blinded at the same time,and thus was a relatively high-bias risk trial. Overall, their trials were classified as of higher methodological quality. Secondly, results are affected by heterogeneity caused by random sampling. For example, the results of the intra-operative blood loss and operating time presented significant heterogeneity. Third, these literature are adopted to different models of artificial disc and different interventions, and thus there may implement bias. Fourth, most of the retrieved documents were English; there may be language bias. Results of these trials tend to report the superior efficacy of lumbar disc replacement, so the system studies have higher risk of publication bias.

Conclusion

The results showed the TDR has significant safety and efficacy, comparable to lumbar fusion at two year follow-up. Although the superiority compared to fusion could not be proved, becasuse clinical symptoms were relieved, motion preserved, and a low reoperation rate during long-term follow-up on TDR, TDR was considered safe and effective. Therefore, authors suggest adoption of TDR on a large scale; once there is failure of TDR, interbody fusion could be performed.