U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Donahue KE, Gartlehner G, Schulman ER, et al. Drug Therapy for Early Rheumatoid Arthritis: A Systematic Review Update [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2018 Jul. (Comparative Effectiveness Review, No. 211.)

Cover of Drug Therapy for Early Rheumatoid Arthritis: A Systematic Review Update

Drug Therapy for Early Rheumatoid Arthritis: A Systematic Review Update [Internet].

Show details

Appendix DRisk of Bias Ratings and Rationales for Included Studies

Appendix Table D-1Risk of bias ratings for randomized controlled trials

StudyROB Rating(s)Rationale for Rating(s)
AGREE, 2009-1531, 129131Low (ACR response, DAS28 remission, LDAS, radiographic outcomes, AEs)

Medium (HAQ-DI, SF-36)
A Low rating applies to ACR response, DAS28 remission, LDAS, radiographic outcomes, and AEs. To handle missing data, NRI was used for ACR response, DAS28 remission, and LDAS; multiple imputation was used for radiographic outcomes; and modified ITT was used for harms, such that all patients receiving one or more ABA dose were analyzed. A Medium rating applies to HAQ-DI and SF-36 outcomes because they were measured using as-observed data, but missing data were minimal for both.
ASPIRE, 2004-917, 106, 107, 157MediumITT analysis probably not used; only patients with data after week 30 were included. However, overall attrition was fairly low at 15%.
AVERT, 20157MediumAttrition not described, and unable to tell if ITT was used
BARFOT Study #1, 200327HighTreatment contamination across groups; PNL arm could have received PNL alone or PNL + MTX, and SSZ arm could have received SSZ alone or SSZ + PNL. No reporting of how findings may have differed following monotherapy vs. combination treatment within treatment arms. High overall and differential attrition also raise concern about ROB. Also, large baseline between-group differences in RF-positivity and Larsen score, such that T1 (the PNL arm) was significantly more likely than T2 (the SSZ arm) to be RF-positive and have greater radiographic damage at baseline. Statistical analyses did nothing to adjust for these differences or determine whether they could have affected the study findings.
BARFOT Study #2, 2005-1478, 97, 138, 140Medium (1, 2, and 10-year outcomes [KQs 13])

High (4-year outcomes [KQs 13])
A Medium rating applies to 1, 2, and 10-year outcomes (KQs 13). Open-label design introduced ROB because patients could have switched treatments based on knowledge of randomized assignments. Only radiographic outcomes measured blindly. Choice of DMARDs prescribed was similar between PNL and no-PNL arms, despite being left up to treating physicians. The significant between-group differences in NSAID and intra-articular injection use over the study’s first 2 years probably not a ROB concern, but more likely reflect differences in treatment effectiveness. LOCF ITT analysis used for efficacy outcomes, except radiographic outcomes, for which completers analysis was used because investigators deemed amount of missing data minimal.78 No-PNL group was significantly older than the PNL group, but statistical analysis adjusted for age as a covariate. A High rating applies to 4-year outcomes (KQs 13)97 because of potential bias from high overall attrition (40%) resulting from investigator exclusion of patients and self-selection of patients into 2-year continuation study, plus attrition between 2-4 years. Baseline characteristics of the retained 4-year sample appear similar to the original study sample’s, but risk of attrition bias is still high.
BeSt, 2005-167991Low (1-5 year outcomes)

Medium (10 year outcomes)
Open-label design with blinded assessment for all outcomes. ITT method not specified except for DAS at 4-year timepoint and all 10-year outcomes (multiple imputation and GEE). Protocol deviation of 70 patients (14% overall) as a potential source of ROB seems unlikely because between-group differences in deviation were not significant (p=0.11), and these patients were still included in ITT analysis.86 Low overall and differential attrition at 1-5 year timepoints, but high enough to introduce attrition bias at 10-year timepoint (overall: 38%; differential: 3.3% to 16.5%). Therefore, a Low rating applies to outcomes measured at 1 to 5 years, while a Medium rating applies to all outcomes at the 10-year timepoint.
C-EARLY, 201738, 39Medium;

High (KQ 2 WPS-RA work productivity outcomes)
High overall attrition for all outcomes, but especially high for work productivity outcomes that apply to KQ 2 and only reported on CT​.gov (work days missed, work days with reduced productivity, interference with work productivity) due to limited availability of baseline data. Therefore, a High ROB rating applies only to KQ 2 work productivity outcomes. LOCF ITT and NRI can account for this. Potential selective outcome reporting bias affecting KQ 2-eligible PROs (e.g., fatigue, work productivity, household productivity), which were not mentioned at all in published article and only reported on CT​.gov.
CAMERA-II, 201294Medium28% attrition is fairly high, but study not fatally flawed
CARDERA, 200893MediumNR whether or not care providers were masked
CareRA, 2015-795, 98, 99MediumNo masking
COBRA, 1997-201024, 100, 141Medium (56 week, 5 year, and most 11 year outcomes)

High (11 year radiographic outcomes)
A Medium rating applies to all relevant outcomes at 56 week, 5-year, and most 11-year timepoints. High differential attrition. A High rating applies to the following 11-year outcomes: mTSS and other radiographic measures (because data only available for 112 out of 155 total patients).
COBRA-light, 2014-525, 105Medium24% protocol violations in COBRA and 7% in COBRA light
COMET, 2008-1412, 108, 109, 154156MediumModerate level of overall attrition. Missing outcome data was handled with LOCF for clinical outcomes and HAQ, and linear extrapolation for radiographic outcomes.
Conaghan et al., 201629MediumITT not stated, high overall and differential attrition
C-OPERA, 2016-713, 153Medium (24 week outcomes);

High (52 week and 2 year outcomes, except discontinuation)
High ROB rating applies to 52 weeks and 2 years. At 24 weeks, rating would be Medium because attrition is much lower. Only outcomes at 24 weeks make sense; afterwards people could switch to rescue medication and drop out rates were very high.
Dougados et al., 1999-2003;21, 104Medium4 patients removed before randomization, but too small a number to affect outcome
Durez et al., 200718MediumSmall study (N=44) with no more than 15 patients in any one arm, which could pose problems in terms of statistical power. Baseline clinical characteristics differed significantly between groups in terms of RF and anti-CCP positivity, but this did not affect findings in the sensitivity analyses conducted by authors and may have resulted simply because of small sample size. Potential selective outcome reporting bias affecting KQ 2-eligible PRO (i.e., VAS-measured pain), which was not reported in the article or on CT​.gov.
Enbrel ERA, 2000-614, 110112MediumHigh overall attrition at 2-year timepoint, and moderate overall attrition at 1-year timepoint. Also moderate differential attrition at the 2-year timepoint. Blinded outcome assessment for radiographic outcomes, but unclear if this was the case for all other eligible outcomes. Also, details about randomization were NR.
FIN-RACo, 1999-201322, 101, 102, 142145MediumOpen label study. Minimal attrition. ITT used.
FUNCTION, 2016-732, 134Medium (1 year outcomes);

High (2 year outcomes)
High overall attrition at 1 year, and much higher attrition at 2 years (47%) when taking into account the patients who were switched to rescue therapy. High ROB rating for all outcomes’ 2-year data because of attrition bias.
GUEPARD, 200992Medium (12 week outcomes);

High (52 week outcomes)
Open-label RCT in which only radiographic outcomes were assessed by a blind rater. Some overall attrition, but LOCF ITT analyses used to account for missing data. A Medium ROB rating applies to 12-week outcomes, but a High ROB rating for all outcomes at 52-week timepoint due to risk of contamination bias. Treatment adjustments were a potential source of contamination bias for both arms at the 52 week timepoint, since patients could be switched to different dosing and treatment regimens when low disease activity was achieved at 12 weeks and beyond (e.g., ADA+MTX --> MTX alone) or in cases of of insufficient response (e.g., ADA+MTX 40 mg every other week --> ADA+MTX 40 mg/week --> ETN). Total use of ETN in average doses was similar between arms, but between-group differences between 12-52 weeks were likely artificially lower as a result.
Haagsma et al., 199723MediumUnclear randomization description, unclear allocation concealment
HIT HARD, 201334Medium (DAS28, ACR response, HAQ-DI, SF-36);

High (mTSS, SHS erosion)
A Medium rating applies to DAS28, ACR response, HAQ-DI, and SF-36 outcomes. Factors contributing to increased ROB include overall and differential attrition at 52 weeks (with lower attrition rates at 24 weeks) and a statistically significant baseline difference between groups in age. There were also baseline differences in SF-36 physical score and SHS JSN score. A High ROB rating applies to mTSS and SHS erosion score outcomes because radiographic data were only available for 59% of ADA + MTX patients and for 55% of MTX-only patients. In fact, investigators found evidence that that patients with missing radiographs differed significantly from those with complete data (for example, higher DAS28 disease activity in those with missing radiographs). Blinded outcome assessment for radiographic outcomes, but this does not attenuate ROB.
HOPEFUL 1, 201435, 150MediumSome overall attrition during 26 weeks of double-blind phase, but no evidence that group similarity was unbalanced as a result. Study dosage of MTX was much lower than current approved U.S. FDA dose because this is a Japanese study done 7-8 years ago, but it seems unlikely this would have affected the magnitude of effect observed in the findings. DAS28-CRP score difference was analyzed as post-hoc outcome, but the direction and magnitude of effect seem to match those of the pre-specified DAS28(ESR) score difference. ITT methods were NRI for binary outcomes of interest (ACR20/50/70 response, DAS28(ESR) remission, % radiographic progression, HAQ-DI response, and AEs) and modified LOCF ITT for continuous outcomes (DAS28(ESR) and DAS28(CRP) scores, mTSS scores, and HAQ-DI scores).
IDEA, 201496MediumUnclear if allocation concealment was used
IMAGE, 2011-230, 132, 133Low
IMPROVED, 2013-69, 158HighOnly the trained research nurses conducting the DAS assessment were blinded for treatment allocation; they were not blinded for other outcome assessment

High attrition rate, ITT analysis is stated, but it’s not mentioned how missing data were handled.
Marcora et al., 2006113MediumOpen-label RCT using a completers analysis with a very small sample (N=24). Still, small attrition rate (n=2 patients, or 7.7%). Unclear if outcome assessment was blinded for DAS28 change from baseline. Also unclear if arms similar in terms of erosive disease or Sharp scores.
Montecucco et al., 20123MediumOpen label, authors report using both ITT and per-protocol analyses
NEO-RACo, 2013-540, 127, 128Low
OPERA, 2013-736, 160163MediumLow attrition rates. Study design details were well-reported and indicate a well-designed RCT. However, increased ROB from Type 2 error (i.e., potential for finding of a between-group difference when there really is none) because study was underpowered for DAS28-CRP disease response and, therefore, for all other outcomes. Treatment blinding was terminated after 1 year, and patients had their treatments reassessed based on clinician judgment through year 2. Still, similar proportions of patients were switched to triple synthetic DMARD therapy or received intra-articular injections in addition to randomized treatments.
OPTIMA, 2013-637, 151, 152Low
ORBIT, 20168HighNon-blinding of participants, outcome assessors, care providers, no ITT analysis performed
PREMIER, 2006-1515, 103, 115119, 149MediumHigh overall attrition. Also moderate differential attrition, but that was attributable mainly to difference in attrition because of lack of efficacy. ITT was used to account for missing data, although the specific type of ITT is not described. Blinded outcome assessment used for radiographic outcomes, but unclear if this was the case for other outcomes.
PROWD, 200816, 152Medium (16 week outcomes);

High (56 week outcomes, except discontinuation)
A High rating applies to 56 week outcomes, except study withdrawal, because of very high overall attrition and moderate differential attrition, but a Medium rating applies to 16 week outcomes, including withdrawal. Missing data were handled using LOCF for continuous outcomes, and NRI for job loss/imminent job loss.
Quinn et al., 200541MediumType 2 error affected radiographic outcomes and possibly disease activity and QOL outcomes because study only statistically powered for MRI bone erosions and because of small sample size (only 10 in each arm). Method of handling dropouts not described.
SWEFOT, 2009-1710, 121126, 168MediumOpen-label design of this RCT creates an increased ROB in that patients more likely to discontinue conventional treatments in favor of treatment with biologics. In fact, discontinuation in conventional arm was significantly greater than in the biologic arm, “accounted for mostly by participants who discontinued prematurely because of lack of effectiveness”.10 Overall attrition exceeded 20% at 1 year timepoint, but the use of conservative NRI analysis and also modified ITT (for comparison) accounted for missing data and treatment switches. Larger overall attrition increases ROB to a borderline High level at 2 year timepoint, but statistical analyses help manage any elevated ROB.
TEAR, 2012-1320, 159HighHigh attrition and modified ITT analysis not sufficient to account for attrition bias
Todoerti et al., 20106MediumMain flaw of this study is its open-label design, which could have introduced information bias that differentially affected how outcomes measured between groups. Randomization method unclear. Otherwise, no notable methodological issues or potential sources of bias.
tREACH, 2013-164, 146148MediumSingle-blinded
U-Act-Early, 2016-733, 135MediumHigh overall attrition, but ITT analyses applied to account for resulting bias. Results of ITT vs. per-protocol analyses were similar for study’s primary outcome: sustained remission. Unclear how well powered the study was to detect differences in outcomes besides sustained remission (study’s primary outcome). ITT methods included NRI and multiple imputation. Treatment arms were mostly similar at baseline, although male vs. female distribution differed by as much as 15% between groups. Note that 52-week data for remission only reported on study’s CT​.gov page

ABA = Abatacept; ACR = American College of Rheumatology (20/50/70 = 20%/50%/70% improvement); ADA = Adalimumab; AE = Adverse event; CT.gov = ClinicalTrials.gov; DMARD = disease-modifying antirheumatic drug; DAS = Disease Activity Score (CRP=C-reactive protein); ESR = erythrocyte sedimentation rate; L = low; 28 = score based on 28 joints); FDA = United States Food and Drug Administration; GEE = generalized estimating equations; HAQ = Health Assessment Questionnaire (DI = Disability Index); ITT = Intention to treat; KQ = Key Question; LOCF = Last observation carried forward; mg = milligrams; mTSS = modified Total SharpSharp/van der Heijde score; MTX = methotrexate; NR = Not reported; NRI = Non-Responder Imputation; NSAID = Nonsteroidal anti-inflammatory drugs; PNL = prednisolone; PRO = patient-reported outcome; QOL = quality of life; RCT = Randomized controlled trial; RF = Rheumatoid factor; ROB = risk of bias; SF-36 = Short Form 36 Health Survey; SHS = Sharp/van der Heijde Score; SSZ = sulfasalazine; VAS = visual analogue scale; vs. = versus; yr = year

Appendix Table D-2ROB ratings for observational studies

StudyDesignROB Rating(s)Rationale for Rating(s)
Bili et al., 201411Retrospective cohort studyHighNot possible to draw valid conclusions from study findings because of how medication use classified. Medication use evaluated as “exposure periods”, and individual patients could contribute data to multiple exposure periods for different drugs. Furthermore, MTX group included MTX monotherapy and combination therapies.
ERAN, 2013137Prospective cohort studyHighHigh risk of bias from classification of interventions. Comparisons of treatment use vs. no use provides insufficient information to draw clear usable conclusions because no-use patients would have taken at least one of seven alternative treatments (Table 1). No information on which alternative treatments patients switched to after discontinuing initial DMARD treatment.
Nijmegen RA Inception Cohort, 200926Prospective cohort studyHighHigh risk of selection bias for treatment discontinuation. High risk of attrition bias at 6 months (overall: 24.3%) and 12 months (overall: 41.3%; differential: 16.1%). High risk of confounding from indication.
NOR-DMARD analysis, 201228Retrospective cohort studyHighHigh ROB from confounding by indication, from time-varying reduction in patients being prescribed SSZ in favor of MTX, and from unbalanced use of concomitant PNL (use in MTX arm exceeded use in MTX arm).

DMARD = disease-modifying antirheumatic drug; MTX = methotrexate; PNL = prednisolone; RA = rheumatoid arthritis; ROB = risk of bias; SSZ = sulphasalazine; TNF = tumor necrosis factor; TNFi = TNF inhibitor(s)

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (24M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...