Introduction

In all types of clinical studies, the ultimate goal is to have complete follow-up data of all included patients. Attrition by non-response at follow-up may threaten the validity of registers. Registers tend to demonstrate rather large proportions of non-respondents [1]. The Swespine national register systematically collects information on patients undergoing spine surgery. It has a good coverage and completeness, but also approximately 25% non-respondents at follow-up at one year [2]. Do these non-respondents jeopardize the validity of Swespine data?

The validity of spine registers in the context of patient dropout has been tested before in Denmark, Norway and Sweden [3,4,5,6]. In these studies, the outcome for non-respondents was assessed in either phone interviews or by postal questionnaires. The sample size varied between 318 and 633. The Norwegian study concluded that non-respondents in statistical terms could be treated as missing at random [4]. In the Danish study, there was no statistical difference in satisfactory results—measured with transition questions—between respondents and non-respondents at the 1-year follow-up [3]. Elkan et al. compared disc hernia patients registered in Swespine to a smaller cohort studied separately. Despite a missing data rate on the outcome variable level of more than 40% the Swespine cohort showed similar results to the separate group, which had a follow-up rate of 98%. Thus, it was concluded that patient-reported outcome data in Swespine was representative for large cohorts [5]. Endler et al. studied 351 patients registered in Swespine as operated for degenerative conditions in the lumbar spine. A total of 148 did not return the Swespine 2-year, or 5-year follow-up questionnaire but responses could be obtained from 115 of these through repeated postal or phone attempts. This “delayed response group” showed a significantly lower perceived change in back pain compared to the first-time respondents, but since other patient-reported outcomes (ODI, VAS and EQ-5D) were similar, the authors concluded that loss to follow-up did not hamper the interpretation of Swespine data [6].

Pushing participants, reluctant to respond to the initial follow-up, to respond through repeated attempts by mail or telephone may induce bias. The present paper seeks to avoid this matter by using a predictive model. In contrast to the previous studies, the current paper examines a large nation wide sample and a larger set of factors, through the linkage of Swespine data to several other registers, and outcome estimation of non-respondents is performed in an algorithmic multiple regression model. An increasing unwillingness to respond to surveys and also a lack of an accepted minimal response rate call for additional studies on registers that investigate the possible outcome of participants lost to follow-up [7].

The aims of this study were to explore if, and how, the outcome data reported by patients in a national quality register was affected by loss to follow-up and to assess if the earlier results could be replicated.

Patients and methods

Study population and data sources

All individuals who underwent degenerative lumbar spine surgery 2008–2012 (according to Swespine and the regional patient administrative system) and who answered the baseline questionnaire from Swespine were eligible for inclusion. Data were anonymized by Statistics Sweden and each observation received a unique identity number used for linkage. Observations were linked on patient level between data from Swespine and patient administrative systems for seven Swedish regions (inhabiting 65% of the Swedish population), administrative data from the National Patient Register and Statistics Sweden (socioeconomics) as well as the Swedish Social Insurance Agency (sick leave and disability pension). Non-respondents were defined as not having returned the one-year follow-up questionnaire.

All analyses were performed on three subgroups defined by a combination of diagnosis (ICD 10) and procedure (Nomesco classification of surgical procedures, NCSP) codes as described in Table 1. The subgroups were: Lumbar Spinal Stenosis (LSS) operated with decompression or decompression + fusion; Lumbar Disc Herniation (LDH) operated with endoscopic, microscopic or open discectomy; Degenerative Disc Disorder (DDD) operated with fusion (all types) or disc arthroplasty. Cases with pseudarthrosis as main diagnosis were excluded.

Table 1 Definition of the subgroups LSS (Lumbar Spinal Stenosis), LDH (Lumbar Disc Herniation) and DDD (Degenerative Disc Disorder) by a combination of main diagnosis code and any of the intervention codes

Patient characteristics and other factors used in the prediction models

The explanatory variables used in the regression analyses are listed in Table 2. They were defined in accordance with Swespine or based on diagnosis or procedure codes from the administrative systems.

Table 2 Independent variables included in the regression analyses

The explanatory variables were included based on clinical relevance and to assess potential impact of any sociodemographic variables. Previous sick leave or disability pension were included as explanatory factors for the disc herniation and the DDD subgroups. These explanatory factors were not applied in regression analyses for spinal stenosis as only 45% of the subgroup sample was below 65 years old at surgery and hence below the general retirement age in Sweden at the time.

The variable unexpected event included ICD-codes for recurrent disc herniation, bleeding, leakage of CSF, restenosis, unspecified adverse event during surgery, infection, unspecified surgery-related damage, pain, cauda equina syndrome, wound rupture, complications associated with vascular injury, mechanical complications associated with disc prosthesis, decubitus, as well as reintervention codes registered during the first twelve months after surgery.

Statistical analysis

Descriptive data are presented for baseline characteristics within each subgroup. In addition to crude values, a formal statistical analysis was performed for each variable between respondents and non-respondents, to recognize any statistically significant systematic differences between the groups (two-sided tests on a 5% significance level). The Chi-squared test was applied for dichotomous and ordinal variables, the Kruskal–Wallis test was used for count variables, and the t-test was used for continuous variables. Descriptive statistics of data in the one-year follow-up questionnaire are also presented.

Two sets of regression analyses were computed. First, logistic regression analyses were performed with non-response as outcome, showing the degree of association between baseline variables and non-response at follow-up. Second, regression analyses were calculated, with patient-reported outcome measures (PROMs) at one-year follow-up as dependent variables. To enable comparisons of predicted PROM values for respondents and non-respondents, respectively, all explanatory variables included were used to predict the outcome of surgery, for the individuals who answered the follow-up questionnaire as well as those who did not.

The primary outcome measure was Global Assessment (GA). It is an ordinal parameter of perceived pain (in back or leg) after surgery compared to preoperative pain, reported retrospectively. Successful outcome of surgery was dichotomized with GA = 1 (pain free) or GA = 2 (much better) referred to as GASUCCESS; and GA = 3 (somewhat better), GA = 4 (unchanged) or GA = 5 (worse) as not successful. GA = 0 (no pain before surgery) was excluded from analysis. The observed and predicted proportion of GASUCCESS (logistic regression), as well as values of the secondary measures Oswestry Disability Index (ODI), Visual Analogue Scale for back and leg pain, respectively (VASBACK/LEG), and Eurqol-5-dimensions index (EQ-5D) at the one-year follow-up (ordinary least squares regression) were calculated in the response group. All outcome measures have previously been well described [8].

Since the regression model excluded individuals with any independent variable(s) missing, it would not be accurate to compare the predicted outcome of the non-respondents to the observed outcome of all respondents. Instead, the comparison is made with respondents displaying complete sets of explanatory variables.

The overall quality of the logistic regression models is reported as the area under the curve (AUC) which indicates the ability of the model to distinguish patients with a successful outcome (GASUCCESS) from patients that do not. The AUC ranges from 0.5 to 1, where 1 indicates that the model correctly predicts 100% of the observations, while 0.5 means that the model is no better than chance. An AUC > 0.7 was considered an acceptable precision.

Adjusted R2 values were calculated for the least squares regression models, which indicate the proportion of the variation in the outcome variables (i.e., final scores of the ODI, VASBACK/LEG, and EQ-5D) that can be explained by the model.

To further report on the accuracy of the models, the observed outcome (i.e., the outcome for respondents) was also compared to the outcome for (the same) respondents predicted by the models.

Ethical considerations

The Regional Ethical Review Board in Stockholm approved the study protocol (reference number 2013/1542-31/5).

Results

Baseline and follow-up characteristics, and health care consumption

All analyses were performed on each one of the three diagnosis groups. To avoid too large a number of tables, only the LSS cohort is presented in the main document and the results for the remaining groups can be found as an online resource (ESM_1). The number of individuals in each group is found in the flowchart in Fig. 1.

Fig. 1
figure 1

Flowchart showing the number of patients in each diagnosis group on the left, and on the right, the number of cases in the regression analyses, with non-respondent as dependent variable, after the exclusion of observations lacking one or several independent variables LDH Lumbar Disc Herniation cohort; LSS Lumbar Spinal Stenosis cohort; and DDD Degenerative Disc Disorder cohort

Baseline variables, expressed as percentages or means, are given in Table 3. In all three diagnosis groups, many variables significantly differed between respondents and non-respondents. In absolute numbers, the differences were small. Health care consumption is presented in Table 4, showing that non-respondents consume a slightly higher amount of health care. Significant differences between respondents and non-respondents at the 95% confidence level are shown in bold. Descriptive statistics of one-year follow-up data are found in Table 5.

Table 3 Baseline characteristics of patients operated for spinal stenosis
Table 4 Health care consumption in the spinal stenosis cohort
Table 5 Descriptive statistics of observed 1-year follow-up data in the spinal stenosis cohort (respondents)

Variables predicting non-response

A younger age predicted non-response in all three diagnosis groups (for LSS, see Table 6), as did also male sex, and being born outside of the EU. Low disposable income, living alone and smoking predicted non-response in the LSS and LDH groups. A low level of education predicted non-response in the LDH and the DDD groups.

Table 6 Regression analysis on the spinal stenosis cohort with non-response to the 1-year follow-up as dependent variable

There were no significant differences in baseline PROM values, apart from EQ-5D in the LSS cohort which indicated that a low preoperative quality of life score increased the probability of non-response.

Among the variables expressing health care consumption, a higher comorbidity, previous spine surgery and unexpected events increased the probability of non-response.

Variables predicting outcome

Variables predicting a successful outcome (according to the GALEG) for patients operated for spinal stenosis is presented in Table 7. The results for the LDH and DDD cohorts are shown in detail in the electronic supplementary material.

Table 7 Regression analysis on the spinal stenosis cohort with the dichotomized patient self-assessment (GASUCCESS) as dependent variable

Male sex and living alone were factors associated with a decreased likelihood of successful outcome in the LSS group but not in the other groups. An age above 70 (reference age: < 40), smoking, a baseline of VASBACK > 40 (ref < 20) and a higher elixhauser comorbidity index decreased the probability in the LSS and the LDH groups. A higher level of education was associated with a higher probability of successful outcome in the LSS and LDH groups. Having the highest level of disposable income as well as being born inside the EU were variables associated with successful outcome in all three groups.

Having an unexpected event decreased the probability of successful outcome in the LSS and DDD cohorts. Previous back surgery decreased the likelihood in the DDD group.

Predicted outcome

Predicted outcome for respondents and non-respondents measured with GABACK/LEG is depicted in Fig. 2. The observed outcome for respondents 12 months after surgery and the predicted outcome for respondents and non-respondents, respectively, is presented in Table 8. The predicted outcome of non-respondents was significantly lower than both predicted and observed outcomes in all three diagnosis groups. The AUC, expressing the predictive ability of the model, was 0.73 for LDH; 0.69 for LSS; and 0.72 for DDD.

Fig. 2
figure 2

Predicted proportions of a successful outcome according to Global Assessment (i.e., participants self-assessing as “pain free” or “much improved” at the one-year follow-up after the operation) for respondents and non-respondents, respectively. LDH Lumbar Disc Herniation (n = 5,192); LSS Lumbar Spinal Stenosis (n = 12,132); DDD Degenerative Disc Disorder (n = 1,587); and Error bars indicate the 95% confidence intervals

Table 8 Percentage of patients self-assessing as having a successful outcome at the one-year follow up compared to the outcome for respondents and non-respondents, respectively, as predicted by the regression model

Significant regression equations were also found for the secondary outcome measures (ODI, VASBACK/LEG, and EQ-5DINDEX). The quality of the models is summarized in Table 9. As was the case for the GA regression models, the observed outcome was highly consistent with the predicted outcome (in detail presented in Table 3 in Supplementary). The secondary measures displayed the same pattern as the GA, depicted in Fig. 3a-c, indicating that non-respondents are predicted to fare somewhat worse than respondents.

Table 9 Predicted capability of least squares regression models with scores at the 1-year follow-up of four different PROMs as dependent variable
Fig. 3
figure 3

ac Predicted mean scores for respondents and non-respondents of the one-year follow-up questionnaires for three diagnosis groups. Error bars indicate 95% CI. LDH cohort n  = 5,192; LSS cohort n = 12,132; DDD cohort n  = 1,587; ODI Oswestry Disability Index; VAS Visual Analogue Scale for back and leg pain; EQ-5D Euroqol-five-dimensional index; Error bars indicate the 95% confidence intervals

Discussion

The results in this study suggest that patients undergoing degenerative lumbar spine surgery and who are lost to follow-up in a national quality register demonstrate a somewhat worse outcome than follow-up respondents.

Similar findings have previously been described in a hip replacement cohort and in a longitudinal follow-up of rotator-cuff tears [9, 10]. The opposite conclusion that non-respondents do not influence the result and can be ignored in register studies has, however, been presented several times. In two such studies on spine register data, response was obtained by repeated attempts to reach dropouts, either by mail or by telephone interviews up to two years after the original follow-up [3, 4]. However, collecting data this way through interviews and telephone calls may induce bias [11]. Furthermore, all patients could not be reached in these studies. In the study by Hojmark et al. [3], the total number of non-respondents was only 48 and the non-response rate was 12%. Solberg et al. [4] studied 138 non-respondents, and the non-response rate was 22%. The current study analyzed a larger set of explanatory variables for a larger population, which might explain the inferential disparity. Elkan et al. [5] compared differences in outcome between the Swespine disc hernia population and a cohort of disc hernia patients from a single hospital. In the latter group, the response rates were considerably larger than in the Swespine register. After adjustments for covariates, the authors found no differences in outcome, between the two cohorts. However, individuals exhibiting characteristics, with a potential effect on non-response, such as previous spine surgery, linguistic difficulties and comorbidity were excluded. Further, only disc hernia patients were studied, while in the present study, larger differences were seen between non-respondents and respondents in the other diagnosis groups. The largest disparity in outcome was seen in the DDD group, suggesting that surgery for lumbar pain—as opposed to surgery for leg pain—is affected by losses to follow-up to a larger extent.

In a recent study [6], differences of similar size were found regarding Global Assessment for back pain, between respondents and non-respondents as reported in the present study. Still, the authors concluded that non-response could be ignored since both groups improved equally according to minimal important change (MIC) values in ODI, NRSBACK/LEG and EQ-5D. However, putting too much trust in score changes may, in the context of lost data, cause a relaxed attitude neglecting the importance of other outcome variables. Neither is reference to MIC theoretically obvious in this context of final outcome, since this value is based on calculation of the clinical importance of individual score changes.

A consequence of the differences between individuals lost to follow-up and respondents is that subgroups with certain characteristics such as differences in socioeconomic status or adverse events are not fully recognized, leading to a decrease in external validity which threatens generalizability. This would mean that the register captures a selected subgroup that does not signify the whole target population. According to the current study, respondents are of older age, more often born inside of the EU, have a higher socioeconomic status, a better baseline health status and less adverse events, compared to non-respondents.

The differences between respondents and non-respondents in age, sex, smoking, level of education and income confirm the results of previous studies [3, 12,13,14]. Younger individuals were more likely to be non-respondents, which may reflect a lack of time because of work and family life.

Being born outside of the EU was significantly associated with non-response. This may reflect that cultural barriers negatively affect not just the outcome but may also be a factor of compliance with given information/instructions.

Non-respondents were more extensive consumers of health care. They made more visits to specialized outpatient care units during the first postoperative year, most likely because of remaining symptoms. As adverse events and previous back surgery were associated with non-response, it suggests that the register may fail to cover a subgroup that burdens the health care system and indirectly all taxpayers. It is important to capture this group of patients since; otherwise, an increase in adverse events over time or at specific centers/hospitals may not be recognized, but just reflected as a higher number of non-respondents. In addition, were these variables to have an influence on the outcome, it could be a threat to the internal validity of the register as it puts the inferences drawn from outcome data at risk. In other words, the outcome of degenerative lumbar spine surgery may be misinterpreted and overrated.

Although not the main purpose of the current study, variables predicting outcome were presented in addition to variables predicting non-response (Table 7 for spinal stenosis cohort, and Tables 1e and 2e for disc hernia and degenerative disc disorder cohorts in the electronic supplementary material). Some risk factors predicted non-response as well as poor outcome (i.e., GASUCCESS = no). In the LSS cohort, these variables were male sex; living alone; born outside the EU; smoking; a higher Elixhauser comorbidity index; having had an adverse event during the first post-op year; and having had previous back surgery. In the LDH group, the corresponding variables were an educational level below university/college; born outside the EU; smoking; and comorbidity. In the DDD cohort, the single risk factor for both non-response and poor outcome was previous back surgery. These variables may be of particular interest in future case-mix adjusted outcome reports.

The current study had the advantage of access to several other registers and could thereby present a higher number of explanatory factors than previous studies. Although this adds some additional knowledge about non-response and its effects on the outcome, it is likely that there are still unknown factors. Furthermore, as presented in Fig. 1, there was a loss of observations also in the present study, due to missing values for the independent variables used for regression analyses.

Participants lost at the recruitment stage (i.e., patients not filling out the preoperative baseline questionnaire) were not included in this study, since they lack most baseline data necessary for analysis. Although we cannot exclude the possibility that they represent a subgroup with reluctance to fill in both baseline and follow-up questionnaires, it is our experience that the major cause of absent preoperative registration is logistic difficulties in certain hospitals.

Our data suggest that it is reasonable to assume that losses to follow-up in registers are missing not at random [16, 17]. What remains is to carefully reflect on the information that we do have to increase the awareness of what impact missing data might have and also suggest steps toward improvements that can increase the response to follow-up [11, 17].

The ability of the the logistic regression models to predict a successful outcome, expressed as AUCs, was 0.69–0.72. An acceptable level was considered 0.70 [15]. In a multiple imputation model, the uncertainty is incorporated in the model itself. The increased knowledge on the features of non-respondents presented in this study could be useful in future studies aiming to evaluate possible advantages of predicting outcome using multiple imputation.

An increased follow-up may be achieved by offering digital questionnaires in addition to the postal ones and to send reminders via text messages and also to shorten the questionnaire. Information in other common languages about the purpose of data collection and also the preservation of anonymity provided in conjunction with the hospital stay and or on the register website may increase participation.

Conclusion

Outcome at the 1-year assessment of degenerative lumbar surgery is not expected to be the same for respondents and non-respondents. The pattern was seen for patients operated for disc herniation as well as for spinal stenosis and degenerative disc disorder. It is important to keep a closer eye on factors strongly predicting non-response and unsuccessful outcome such as unexpected adverse events—since the overall results otherwise may be overrated. The role of loss to follow-up in longitudinal register-based studies should not be underestimated.