U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of New Statistical Methods to Compare the Effectiveness of Adaptive Treatment Plans

New Statistical Methods to Compare the Effectiveness of Adaptive Treatment Plans

, PhD, , MD, MSCE, , PhD, , MPH, , PhD, , MS, , PhD, , PhD, , BPharm, PhD, , MD, MPH, and , MD, MSc, MBA.

Author Information and Affiliations

Structured Abstract

Background:

During routine clinical care, treatments are adaptive to patients' responses to previous treatment assignments. However, methods for comparative effectiveness research (CER) are predominately designed for nonadaptive treatments. This project aimed to evaluate the comparative effectiveness of patient-centered adaptive treatment strategies (PCATS) at the initiation of treatment and over the course of the disease progression. As a case in point, despite many medication options, polyarticular-course juvenile idiopathic arthritis (pcJIA) is often refractory and requires better adaptive treatment strategies (ATS).

Objectives:

Aim 1. To develop, refine, and disseminate bayesian causal inference methods for evaluating clinical effectiveness and for informing better PCATS.

Aim 2. To evaluate the clinical effectiveness of the recommended ATS for patients with pcJIA using real-world data.

Methods:

We propose the GPMatch method, a nonparametric full bayesian doubly robust causal inference method that uses Gaussian process (GP) prior as a matching tool. We performed simulation studies to evaluate its performance compared with that of some widely used causal inference methods: propensity score subclassification, augmented inverse treatment probability weighting, regression adjustment, and bayesian additive regression trees (BART), under dual-misspecification settings. We extended both GPMatch and BART methods for ATS and applied them to electronic medical record (EMR) data to compare 2 consensus treatment plans (CTPs) that began with a disease-modifying antirheumatic drug (DMARD) at different times in treating children with pcJIA: the early-combination plan uses biologic and nonbiologic DMARDs (b+nbDMARD) soon after diagnosis, while the step-up plan starts with an nbDMARD first and then introduces bDMARDs later. The primary end points were Clinical Juvenile Arthritis Disease Activity Score (cJADAS10, with a cutoff at 10 for active joint count) results at 6 and 12 months, and the secondary end point was the Pediatric Quality of Life Inventory (PedsQL) score at 12 months.

Results:

Simulation studies demonstrated that GPMatch, followed by BART, performed as well as or better than some commonly used non-bayesian causal inference methods for comparing both nonadaptive treatment strategies and ATS, as measured by the root mean square error (RMSE) and median absolute error (MAE). The pcJIA CER suggests that by 6 months, the early-combination plan reduced disease activity on average by 2.0 points (95% CI, 0.4-3.6 points) more than the step-up plan as measured by the cJADAS10. By 12 months, the early-combination plan remained more effective than the step-up plan: The average improvement in cJADAS10 was 2.6 points (95% CI, 0.6-4.6 points) if the first-line treatment was continued or reduced and 2.2 points (95% CI, 0.3-4.14 points) if the treatment was escalated. Both CTPs were effective in improving the PedsQL score by 12 months, reporting improvements of 74.8 ± 2.0 and 80.4 ± 3.7 points for the step-up and early-combination CTPs, respectively. If treated on the early-combination plan, patients were expected to achieve an average of 5.61 (95% CI, −3.89 to 15.12) more points on the PedsQL than were patients treated on the step-up plan.

Conclusions:

The GPMatch method accomplishes matching and flexible modeling in the same step and has well-calibrated frequentist properties. It is doubly robust in the sense that the average treatment effects are correctly estimated when either of the following conditions is satisfied: (1) The GP mean function correctly specifies the potential outcome model; and (2) the covariance function correctly specifies the matching structure. The pcJIA CER study suggests that the early-combination plan is more effective in reducing disease activity 1 year later. We developed a user-friendly graphic interface online R Shiny application, “PCATS,” which is easy to use, making GPMatch and BART methods accessible to general CER investigators.

Limitations:

The GPMatch method is computationally intensive and not yet extended to nonnormally distributed outcomes. The PCATS online app assumes no missing data and single time-dependent confounding. Missing data are an inherent feature of EMR data, and our CER study addressed missingness at the design, data management, and data analysis steps. Nevertheless, the study results may be limited by the missing data handling procedures. We assume that the EMR captures important treatment considerations from the clinician's perspective, but not from the patient's perspective. Sensitivity analyses were performed to account for missing potential confounders from the patient's perspective. Finally, the CER study only analyzed 2 of the 3 CTPs.

Background

Motivation

Using causal inference methods, observational data can be used to provide evidence of clinical effectiveness.1 However, most methods are designed to evaluate causal treatment effects from treatment assignments made at a single time point. Furthermore, the current statistical causal inference methodology for adaptive treatment strategies (ATS) may perform poorly under model misspecification.2 However, when dealing with data collected from the real world, it is nearly impossible to correctly specify a model. We proposed a novel bayesian causal inference method that is designed to lessen the model misspecification problem in evaluating the comparative effectiveness of ATS.

In clinical practice, physicians adjust treatment plans over time and adapt them to patients' responses to previous treatments and disease progression. This is an example of an ATS. Not all time-varying treatments are ATS. A nonadaptive treatment strategy is a predetermined plan that does not adapt to patients' responses to previous assignments. The use of ATS is ubiquitous in clinical care practices,3 yet currently, clinical decisions are predominantly based on evidence provided by parallel-arm randomized clinical trials, whose results do not necessarily apply to the adaptive treatment. Although a sequential multistage adaptive randomized trial (SMART) could be used, it may not be feasible for rare disease- or low disease-prevalence settings.

This project was motivated by the need to evaluate the comparative effectiveness of ATS that are common in treating patients with chronic disease. One of the specific aims of the project was to evaluate a set of time-varying adaptive consensus treatment plans (CTPs) recommended for children with juvenile idiopathic arthritis (JIA).4

JIA is one of the most common types of rheumatologic disease in children. The cause of childhood arthritis is unknown, and current understanding of the disease etiology and pathogenesis is limited.5 The prevalence rate of JIA is approximately 19.4 per 100 000 for girls and 11.0 per 100 000 for boys.6 JIA is a heterogeneous group of diseases. Systemic JIA presents distinct features and requires more distinctive treatment approaches than do other types of JIA. Nonsystemic JIA includes polyarthritis, oligoarthritis, psoriatic arthritis, enthesitis-related arthritis, and undifferentiated arthritis, collectively referred to as polyarticular-course JIA (pcJIA). pcJIA is often refractory to treatment, and patients with pcJIA alternate between relapse and remission.7

Various treatment options have been made available for JIA in the past 2 decades.8 The advent of biologic disease-modifying antirheumatic drugs (bDMARDs) and nonbiologic disease-modifying antirheumatic drugs (nbDMARDs) has revolutionized treatment, making it possible to set inactive disease as the treatment target.9 However, it is unknown at the time of initial treatment which treatment strategy will be the most effective to induce remission for a given individual. Additionally, for a patient who does not respond to previous treatment, the next best option is often unknown. Such poorly guided treatment strategies produce inferior treatment outcomes.4 Despite advanced medical treatment, half of patients experience suboptimal health-related quality of life (HRQOL).10 As the first step towards informing better medical decision-making to help optimize patient outcomes, a panel of JIA experts developed 3 CTPs (Figure 1): step up, early combination, and biologic only for patients newly diagnosed with pcJIA.4

Figure 1. CTPs for Patients With pcJIA.

Figure 1

CTPs for Patients With pcJIA.

In this project, we intended to answer the following specific comparative effectiveness research (CER) questions:

  1. At the time of pcJIA diagnosis, is early combination of biologic and nonbiologic DMARD (b+nbDMARD) treatment more effective in improving the Clinical Juvenile Arthritis Disease Activity Score (cJADAS10, with a cutoff at 10 for active joint count) at 6 months than the commonly used nbDMARD treatment?
  2. After completion of the first stage of treatment at 6 months, given patients' responses to the previous treatment assignment, is it more effective to adapt or not to adapt treatment following the CTPs?
  3. Over 12 months of treatment following the CTPs, what are the differences in the effectiveness of ATS?

Need for Methods Development

Causal inference using observational data rests critically on the concept of the potential outcomes. To understand the potential outcome, we may imagine a counterfactual world, where the same patient could be treated multiple times using different treatments started at the same time when a treatment decision needs to be made. The outcome that we would observe if we could have access to this counterfactual world, under all potential treatment choices, is the potential outcome. For example, if there are 2 potential treatment choices, then there are 2 potential outcomes, Y, which we may denote by Y(0) and Y(1). Then, we could easily identify the best treatment choice for the patient by comparing Y(1) against Y(0). However, the counterfactual world is not real. In the real world, we could only observe 1 of the potential outcomes, that is, Y = Y(0) if treated with the comparator, and Y = Y(1) if treated with the intervention under investigation. The fundamental challenge in causal inference is to uncover the missing potential outcomes.

For ATS, we are interested in comparing treatment sequences. Specifically, for comparing the early-combination vs step-up CTPs presented in Figure 1, we are comparing 4 possible treatment sequences: (1) b+nbDMARD followed by nbDMARD, (2) nbDMARD followed by b+nbDMARD, (3) b+nbDMARD throughout, and (4) nbDMARD throughout. At the 6-month time point following the first stage of treatment, we have 2 potential outcomes: Y(0) from treatment with nbDMARD at baseline, and Y(1) from treatment with b+nbDMARD at baseline. At the 12-month time point following the decision made at 6 months about the second stage of treatment, we have 4 potential outcomes, Y(00) and Y(11) from treatment with nbDMARD and the b+nbDMARD throughout both stages, respectively; and Y(10) and Y(01) from treatment with bDMARD early in stage 1 and later in stage 2, respectively. As this example shows, the fundamental challenge of causal inference is even more difficult in the ATS setting.

The complexity of ATS requires advanced causal inference methods. Because of the adaptive assignment process, patients who respond better (or worse) are likely to end up in the same arm of ATS; thus, the treatment effects are confounded by the posttreatment time-varying intermediate outcomes and covariates, in addition to the baseline covariates. The propensity of treatment assignment differs at every decision point, based on time-varying covariates, treatment history, and disease progression. Any misspecification in these propensity scores could be propagated. In addition, the number of potential outcomes exponentially increases with the increased stages of ATS, leading to increasingly sparse data and underpowered causal inference. Therefore, model misspecification is particularly challenging under the ATS setting. To ensure that CER produces valid, reliable, and reproducible results for the ATS, it is critical that causal inference methods be robust to model misspecification and able to address time-varying confounding.

The most widely used causal inference method is the propensity score (PS) method. A Google Scholar search of the keyword “propensity score” alone revealed >170 000 scholarly works since 2010. Despite the theoretical appeal of PS approaches, the validity of CER results hinges on the correct specification of the PS.11 Many other causal methods have been proposed, which fall into 3 categories: design based (eg, PS and matching methods), model based (eg, bayesian nonparametric), and a combination of design- and model-based methods. Most of these methods rely on strong causal inference assumptions, such as no unmeasured confounders and correct model specification. Comprehensive evaluations of the existing methods suggest poor operating characteristics of some commonly used causal inference methods. For example, increasing sample size could lead to more biased results under the most realistic setting of dual misclassification (ie, we do not know the true model behind the data-generating processes of either the exposure or the outcome).11,12 As a result, CER studies of ATS could lead to inconsistent results and misleading conclusions.

Doubly robust (DR) approaches attempt to address model misspecification. When the PS model is misspecified, a DR estimator produces valid causal inference if the outcome modeling is correctly specified. Most DR methods use some combination of PS and outcome modeling. The most widely adopted method is the augmented inverse probability treatment weighting (AIPTW), which augments the inverse probability treatment weighting (IPTW) by separate outcome modeling. Incorporating the PS or a function of the PS into the outcome regression modeling as a covariate is another approach.13 Comprehensive studies by Gutman and Rubin12 suggest that the performance of widely adopted methods, such as PS matching, may suffer poor operating characteristics (ie, the average treatment effect [ATE] estimates do not approach the truth as sample size increases). Considering a single confounding variable X, the authors suggest that regression of outcome YX after matching on X provides a better solution. However, there are many different matching approaches that can vary in performance. Many matching procedures require arbitrary decisions on the caliper, a parameter to determine whether a match is achieved, as well as whether matching is done with or without replacement and the matching ratio. Sometimes, the matching procedures lead to discarding a large percentage of individuals, which can limit the generalizability of the results and reduce study power.

The performance of existing methods for ATS has not been adequately evaluated under realistic problem setups. In the current literature, studies evaluating the performance of causal inference methods have been predominately performed under nonadaptive treatment settings. Only a few studies have evaluated ATS. Daniel et al14 provide a tutorial and comparison of different ATS methods under the randomized trial setting (ie, the SMART setting). Newsome et al2 presented a comparative study of existing ATS methods in an observational study setting. These studies did not consider potential model misspecification for outcome models. Furthermore, we lack general and easy-to-use analytic tools for ATS studies that do not demand specialized statistical programming skills and that are accessible to the general research community.

Causal Assumptions

Most existing causal inference methods rely on 3 fundamental causal assumptions13:

  • Stable unit treatment value assumption (SUTVA). The potential outcomes of 1 experimental unit do not change despite how the treatment was assigned and are not related to the treatment received for the other experimental units.
  • Strong ignorable treatment assignment assumption. The treatment assignment (A) is independent from the potential outcomes (Y(0), Y(1)) given the measured confounders. In other words, there are no unmeasured confounders in the study.
  • Positivity assumption. This assumption ensures that every patient has a nonzero probability of being assigned to one of the treatment arms.

These causal assumptions are required within the PS framework, but they were widely considered overly strong. It is conceivable that outcome measures may be subject to measurement error. In addition, many factors may influence the treatment effect, such as presurgery procedures, timing of treatments, and concomitant medications. These factors are likely to subject the observed outcomes to additional measurement error. In addition, the unmeasured confounder assumption is widely acknowledged as overly strong for observational studies. These widely adopted assumptions were laid out within the theory of the PS framework. Alternative frameworks for causal inference were proposed under the direct acyclic graphic (DAG) framework15 and more recently under a stratified-sampling framework.16 In the Methods section, we present an alternative and weaker (ie, more easily met) set of causal assumptions.

Advantages and Challenges of Bayesian Causal Inference

Bayesian modeling is particularly suitable for CER because it can easily incorporate existing knowledge into a prior distribution, synthesize data evidence from different sources, account for model uncertainties, and inform optimal decisions.17 It produces a posterior distribution that offers more information beyond the traditional point estimate and 95% CIs. The parameter-rich bayesian modeling techniques provide a powerful tool for addressing potential model misspecifications. A full bayesian approach offers a coherent and versatile framework to address time-varying adaptive treatment assignment and time-varying confounding.

Bayesian approaches to causal inference have primarily taken the direct modeling approach. The direct modeling of outcomes allows us to use the many well-developed regression modeling techniques, including parametric to nonparametric approaches, and to address complex data types and structures. The bayesian approach allows for incorporating prior knowledge and synthesizing information from different sources and thus can be used for tackling complex problems involving encouragement trials,18 dynamic treatment regimes,19 and treatment noncompliance.20,21 More recently, bayesian nonparametric approaches have been increasingly used. Bayesian additive regression trees (BART) have been shown to produce more accurate estimates of ATEs than do PS matching, propensity-weighted estimators, and regression adjustment in the nonlinear setting, and they perform as well in the linear setting.22,23 Gustafson suggested the use of the weighted average of the answers from a parametric and a nonparametric bayesian model.24 Others have advocated for the use of bayesian model averaging, including Cefalu et al25 and Zigler and Dominici.26 More recently, bayesian modeling with Gaussian process (GP) and Dirichlet process priors has been used to address heterogeneity of treatment effects (HTE),27 dynamic treatment assignment,28 and missing data.29 These parameter-rich models could mitigate concerns over potential model misspecification. However, by not accounting for confounding-by-indication in the analyses, the parametric-rich model could suffer from overfitting, which subsequently may introduce bias in estimating causal treatment effects.

A better bayesian causal inference method is needed to account appropriately for bias due to confounding-by-indication in observational studies. Currently, the bayesian approach in causal inference is predominately model based, imputing the missing potential outcomes. However, causal inference presents challenges in addition to the conventional missing data problem.30 In particular, because no more than 1 potential outcome can be observed for a given individual, the missing data are highly structured such that the correlations between the 2 potential outcomes are nonidentifiable. Consequently, different analyses could arrive at different inferential results; this is the issue of “inferential quandary.”31 Confounding-by-indication and time-dependent confounding are additional challenges.30 Ignoring these challenges may lead to biased estimates of causal effect.32 Many investigators have been actively searching for ways to correct for treatment selection bias in bayesian causal inference. Including the PS as a covariate in the outcome regression model is 1 option. However, joint modeling of outcome and treatment selection models leads to a “feedback” issue, in which the information from the outcomes plays an important role in the estimation of PS such that it defeats the role of PS as a balancing score, and this subsequently leads to biased causal inference estimates. To overcome the feedback issue, a 2-stage approach was suggested.33,34 Saarela et al35 proposed an approximate bayesian approach incorporating inverse treatment assignment probabilities as importance sampling weights in Monte Carlo integration. This approach offers a bayesian version to AIPTW. More recently, Hahn et al32 suggested introducing the PS into the formulation of a prior as a way of regularizing outcome modeling. These methods all require a 2-stage approach, and their performance may suffer if the PS is not correctly specified.

Objectives

Motivated by the need to fill the gap in the understanding of the effectiveness of CTPs in patients with pcJIA and recognizing the need for methodology development in bayesian causal inference, we proposed the patient-centered adaptive treatment strategies (PCATS) project. This project had 2 specific aims:

Aim 1. To develop, refine, and disseminate bayesian causal inference methods for evaluating clinical effectiveness and for informing better PCATS.

Aim 2. To evaluate the clinical effectiveness of the newly recommended ATS for patients with pcJIA via an analysis of real-world data.

Significance of the Project

Routine clinical approaches in treating patients with chronic or prolonged disease conditions are time dependent and adaptive to patient responses to the previous treatment, yet few causal inference methods have been developed for ATS. A robust and efficient causal inference method that can be used for evaluating the comparative effectiveness of ATS is essential to inform better treatment strategies for patients with chronic or prolonged disease conditions. No user-friendly statistical software tool is available for evaluating causal treatment effect for ATS. This project is important because it filled this methods gap. We developed a novel bayesian DR causal inference method, GPMatch. We also offer a user-friendly online application with a graphic user interface (GUI), “PCATS,” which makes advanced bayesian causal inference methods (both GPMatch and BART) for both nonadaptive treatment strategies and ATS available and accessible to the general public. In addition, the project applied these advanced causal inference methods to evaluate the CER of 2 ATS in treating children with newly diagnosed pcJIA. The CER study is significant, as it offers, for the first time, real-world evidence of the effectiveness of the early-combination plan vs the step-up treatment plan.

Patient and Stakeholder Engagement

As a methodology development project, we aimed to improve the outcomes of patient-centered care by improving statistical analysis methods that could be used to conduct CER. Figure 2 presents an overview of patients and stakeholders (hexagons) with whom we actively engaged through different channels (circles). We partnered with the Pediatric Rheumatology Care and Outcomes Improvement Network (PR-COIN) to engage with patients, health care providers, and researchers; with the Cystic Fibrosis Foundation to engage with health care providers, researchers, and statisticians in cystic fibrosis research; with the Center for Clinical and Translational Science and Training to engage with health care providers, researchers, statisticians, and students/trainees in medical statistics and biostatistics; and with the PCATS Stakeholder Advisory Panel (SAP) to engage with patients/parents, health care providers, and researchers. Through other professional communities and conferences, we were able to broaden our reach to a greater number of patients and stakeholders.

Figure 2. Patient and Stakeholder Engagement.

Figure 2

Patient and Stakeholder Engagement.

During the patient and stakeholder engagement, rheumatologists had questions about when and whether to use an aggressive treatment approach in the treatment of patients with JIA, and pulmonologists raised concerns about the treatment burden among patients with cystic fibrosis and questioned when and whether they should withdraw patients from an ongoing therapy. These questions were acknowledged as important in other long-term or chronic diseases in general. Statistical methodologies that evaluate ATS using real-world data can help clinicians choose a treatment based on a patient's disease status at different stages.

Partnering With PR-COIN

We actively engaged patients and stakeholders through PR-COIN (https://pr-coin.org/), a member of PCORnet. PR-COIN, which currently has 18 participating clinical centers across North America, holds semiannual learning sessions for patients/parents and health care providers to share ideas and learning experiences for improving outcomes in patients with JIA. We shared the PCATS study results and received input from patients and health care providers during these learning sessions. We also discussed topics such as study design, rigorous statistical analyses, and data quality issues. Additionally, this partnership has helped PR-COIN with improving data quality and data collection forms.

Stakeholder Advisory Panel

The SAP consisted of 2 parents of patients, 2 clinicians, and 1 health policy researcher. The primary mission of the SAP was to ensure that the project was patient centered—in particular, to confirm that the methods development could adequately answer the specific patient-centered questions and that the methods development was understandable, meaningful, and generalizable to broader patient populations and stakeholders. The secondary mission of the SAP was to monitor and advise on project progress, provide approval, and facilitate dissemination of the project results. The principal investigator, project manager, and 5 SAP members had 1 in-person meeting or 2 conference calls every year.

Methods

The methods are presented separately for each of the 2 specific aims: methods development (aim 1) and CER study in patients with pcJIA (aim 2). Detailed methods are reported in Appendix A and Appendix B.36

GP Covariance as a Matching Function

Matching experimental units on their pretreatment assignment characteristics helps remove bias by ensuring similarity or balance between the experimental units in the 2 treatment groups. Matching methods impute the missing potential outcome with the value from the nearest match or the weighted average of the values within the nearby neighborhood defined by (a chosen value) caliper. Matching on multiple covariates can be challenging when the dimension of the covariates is large. For this reason, matching is often performed using the estimated PS or by the Mahalanobis distance (MD). Under the no-unmeasured confounder setting, matching induces balance in covariates between the treated and untreated groups. Therefore, it serves to transform a nonrandomized study into a pseudorandomized study. There are many different matching techniques.37 A recent simulation study by King and Nielsen38 compared PS matching with MD matching. Their study showed that, under some settings, PS matching can result in more biased and less accurate estimates of averaged causal treatment as the precision of matching improves, while the MD matching showed improved accuracy.38 A common practice in matching, particularly in 1:m matching, is that individuals without a match are discarded. Such a practice may lead to a sample that is no longer representative of the target population and might reduce study power. A user-specified caliper is often required, but it is not immediately clear how to best choose an optimal caliper. Furthermore, matching on a misspecified PS may lead to biased causal inference results. A better approach to matching should be nonparametric and avoid arbitrary decisions on caliper or discarding data points.

GP prior has been widely used to describe biological, social, financial, and physical phenomena, due to its ability to model highly complex dynamic systems and its many desirable mathematical properties. Recent literature (eg, Choi and Woo39 and Choi and Schervish40) has established posterior consistency for bayesian partially linear GP regression models. Bayesian modeling with a GP prior can be viewed as a marginal structural model where the potential outcome under the no-treatment condition is modeled nonparametrically. It allows for predicting the missing potential outcomes of a given patient by a weighted sum of the observed data from his or her matched neighbors, with larger weights assigned to those neighbors in closer proximity and smaller weights to those neighbors further away from the patient, much like a matching procedure.

To improve causal inference methods that are robust to model misspecification and allow the bayesian causal inference to account for confounding-by-indication within a full bayesian framework, GPMatch is proposed as a bayesian nonparametric causal inference method. GPMatch uses GP prior as the matching tool, where the GP prior is formulated in such a way that for each individual patient (i-th) in the sample, it allocates a weight range from 0 to 1 to the outcome observed from other (j-th) patients in the data set, based on their similarity defined by the squared-exponential covariance matrix:

K(vi,vj)=σf2exp(k=1q|vkivkj|2ϕk).

Here, vki and vkj are observed values of the k-th confounding variable for the i-th and j-th patients, correspondingly. The length-scale parameters ϕk and variance σf2 determine the smoothness and shape of the biologic mechanism, which are estimated based on the observed data.

The SE covariance function is used for its ability to fit the smoothed response surface. By including confounding variables (denoted by V) into the covariance function, the GP prior specifies that patients with the same values of all confounding variables are matched completely, that is, assigned a weight of 1. The matching utility of GP prior can be considered as a matching process being performed for each i-th patient. The K(vi, vj) determines how similar or dissimilar the j-th patient is compared with the i-th patient. It assigns a larger weight to patients who are similar, and it assigns less or 0 weight to patients who are less similar or sufficiently different. As a consequence, GP prior accomplishes “matching” for each individual patient. Of note, although only a small data set is used for matching the i-th patient, different sets are used for different patients. Collectively, no data are discarded, and all data are used for estimating the causal treatment effect.

After matching, GPMatch then estimates the expected potential outcomes for a given patient by using data from other matched patients who are sufficiently similar. The matching, weighting, and estimation processes are accomplished in a single step of bayesian GP regression modeling. The GPMatch method can easily incorporate different types of treatments. For example, continuous treatment and its potential higher-order terms could be included in modeling treatment. Heterogeneous treatments can be evaluated by including treatment-by-covariate interactions. Higher-order terms could be included to model a treatment effect as a nonlinear function of a continuous variable.

The SE distance can be considered as an alternative metric to the MD,

MDij=(vivj)S1(vivj)if|vikvjk|<c,fork=1,2,,q;otherwise,

where cR+ is the caliper and S is the sample variance-covariance matrix of confounding variables v. Of note, MD matching requires specification of a caliper. Smaller c leads to more precise matching but often results in a serious reduction in sample size after matching. Compared with MD matching, GPMatch does not require arbitrary specification of a caliper; instead, the length-scale parameters (ϕk), which govern the extent to which the data points are matched, are estimated from the data. We allow different length-scale parameters for different confounding variables, such that they acknowledge that some confounders may play a relatively more important role in matching than other confounders. The variables with larger values of ϕk parameters are considered more important than those with smaller values. GPMatch produces valid causal inference results under some important causal assumptions.

Causal Assumptions

Nonadaptive Treatment

Considering 1-time-point treatment assignment, the causal assumptions are depicted in the DAG (Figure 3), where rectangular nodes are observed variables and oval nodes are latent or unobserved variables. For simplicity of presentation, we describe the assumptions by considering a binary treatment assignment A = 0/1, where 0 indicates control and 1 indicates intervention. The method, however, is applicable to any type of treatment. Corresponding to A = (0, 1), the potential outcomes (Y(0), Y(1)) are 2 latent variables. The unobserved covariates are denoted by U. The potential Y(0) under the controlled condition is determined jointly by X, a p-dimensional vector, and V, a q-dimensional vector, of the observed covariates plus an unobserved covariate, U0. Thus, (X, V, U0) are prognostic variables. Similarly, the potential outcome Y(1) under the intervention condition is determined jointly by the observed covariates (X, V) and the unobserved covariates (U0, U1). The observed outcome Y is a noisy version of the corresponding potential outcomes, with error term (ε). The treatment is assigned according to an unknown PS, which is determined by the baseline covariates: observed V and unobserved U2. The observed baseline covariates (X, V) could be overlapping sets, whereas different symbols are used to distinguish their roles in biologic mechanisms driving potential outcomes and the treatment assignment process. For example, X could include patient age, sex, genetic makeup, family disease history, and past and current medication use, as well as laboratory results and other disease characteristics, which are directly related to the prognosis of the disease. The V could include the above-mentioned X variable, as well as other considerations of the treatment decision, including patient insurance, socioeconomic status, education, and clinical centers. Most of these important X and V covariates are available in disease registries or electronic medical records (EMRs). Other factors, such as patient and clinician personal preferences, cultural beliefs, and past experiences, may play a role in treatment decisions. However, these factors are almost never recorded, and are collectively referred to as U2. The DAG could also include additional paths among (U0, U1, X, V) to allow for correlations among them. These paths are not included in Figure 3 to provide a clearer visual presentation. The direction of the arrow in the DAG indicates the direction from a cause or determinant to an effect. For example, the observed Y is directly determined by the treatment assignment (A) and the potential outcomes Y(0) and Y(1). For another example, the VA path suggests that V encompasses all the observed direct determinants to the treatment decision. The bidirectional arrow indicates the correlated relationship.

Figure 3. DAG Presentation of the Counterfactual World-Problem Setup.

Figure 3

DAG Presentation of the Counterfactual World-Problem Setup.

A side-by-side comparison of the DAG with the 3 widely adopted causal assumptions laid out by Rosenbaum and Rubin13 should be helpful to see the differences between the two.

Causal Assumption 1

Instead of the SUTVA, we make the stable unit treatment value expectation assumption (SUTVEA). Specifically:

  • The consistency assumption by Rosenbaum and Rubin13 requires that the observed outcome is an exact copy of the potential outcome, that is, Yi=Yi(0)(1Ai)+Yi(1)Ai. Instead, we consider the observed outcome as a noisy version of the potential outcome, where expectation of the observed outcome E(Yi)=Yi(0)(1Ai)+Yi(1)Ai.
  • The no-interference assumption by Rosenbaum and Rubin13 requires that the potential outcomes of 1 unit are not affected by the treatment of other units. Instead, we assume that the observed outcomes from different units are conditionally independent given the observed covariates Yi(a)Yj(b)|X,V.

The SUTVEA assumption acknowledges the existence of residual random error in the outcome measure. The observed outcomes may differ from the corresponding true potential outcomes due to some measurement error. In addition, the observed outcomes could differ when the treatment received deviates from the intended version of treatment. For example, outcomes could differ by the timing of the treatment, the presurgery preparation procedure, or the concomitant medication.

A hypothetical example may help illustrate the subtle difference in the 2 no-interference assumptions. Jane (i) and Joe (j) are a wife and husband who live together. Jane always cooks. Jane takes a certain medication (Ai = 1) for hormonal changes (U2), but Joe does not (Aj = 0). The hormonal changes in Jane lead her to crave fatty food (V). The fatty diet may causally increase cholesterol (Y) for both Jane and Joe. This is an example of the no-interference assumption being violated because the cholesterol level in Joe is related to medication use by Jane. Here, the cholesterol levels in Jane and Joe are correlated, due to the unobserved U2. However, U2 had an effect on the potential outcomes only via X (eg, sex, cohabitation, and race) and V. Here, (Yi(a), Yj(b)) are correlated, while Yi(a)Yj(b)|X,V. In other words, the correlation between Jane's and Joe's potential outcomes is determined by their diet and other covariates X. Conditional on (X, V), the treatment received by Jane and, subsequently, her potential outcome, is independent from the potential outcome for Joe.

Causal Assumption 2

Approaching causal inference as a missing potential outcome problem, we require a missing-at-random (MAR) assumption for the joint distribution of the outcomes; that is,

[Y(0),Y(1)|A=1,X,V]=[Y(0),Y(1)|A=0,X,V], and[Y|Y(0),Y(1),A=1,X,V]=[Y|Y(0),Y(1),A=0,X,V].

Jointly, we may write

[Y,Y(0),Y(1)|A=1,X,V]=[Y,Y(0),Y(1)|A=0,X,V].

Here the [.] notation indicates the joint distribution. This is equivalent to the MAR assumption that is widely adopted in the missing data context. The assumption is necessary to ensure that the causal effect is identifiable. It does not require the unmeasured-confounder assumption. Rather, it only requires that the minimum sufficient set be observed following the DAG. It allows for 3 types of confounders (U0, U1, U2). Although (U0, U1, U2) are correlated with both (Y) and (A), we can see from the DAG that their existence does not affect the identifiability of the causal treatment effect and thus is admissible. If U2 is null, then the assumption is equivalent to the strong ignorable treatment assignment assumption.

Causal Assumption 3
Positivity Assumption

As with Rosenbaum and Rubin,13 we assume every sample unit has a nonzero probability of being assigned to either of the treatment arms, that is, 0 < Pr(AiǀVi) < 1.

This assumption is adopted to ensure the equipoise of the causal inference. Because we can never tell whether the lack of overlap in covariates is a manifestation of data sparsity or a lack of equipoise, we assume that positivity is ensured at the design stage rather than at the analytical stage.

Time-Varying Adaptive Treatment

For time-varying adaptive treatments, causal assumptions 1 to 3 are extended. For simplicity of presentation, without loss of generality, the causal assumptions are presented for a 2-stage setting.

Causal Assumption 1ATS

(SUTVEA) The observed outcomes are some noisy versions of the corresponding potential outcomes E(Y1i)=Y1i(0)(1A0i)+Y1i(1)A0i and E(Y2i)=(1A0i)[Y2i(00)(1A1i)+Y1i(01)A1i]+A0i[Y2i(10)(1A1i)+Y1i(11)A1i]. The observed outcomes from different units are conditionally independent given the observed covariates Y1i(a)Y1j(b)|X0,V0 and Y2i(a0,a1)Y2j(b0,b1)|X0,V0,X1,Y1.

Causal Assumption 2ATS

(Sequential MAR assumption for the outcomes) The joint distribution of the observed and potential outcomes following the k-th treatment assignment is independent from the actual k-th treatment assignment.

[Y2,Y2(A00),Y2(A01)|A1=1,A0,X0,V0,Y1,X1]=[Y2,Y2(A00),Y2(A01)|A1=0,A0,X0,V0,Y1,X1][Y1,Y1(0),Y1(1)|A0=1,X0,V0,X1]=[Y1,Y1(0),Y1(1)|A0=0,X0,V0,X1][X1,X1(0),X1(1)|A0=0,X0,V0]=[X1,X1(0),X1(1)|A0=1,X0,V0]
Causal Assumption 3ATS

(Sequential positivity assumption) Every sample unit has a nonzero probability of being assigned to either of the treatment arms at all treatment decision points, that is, 0 < Pr(A0iǀVi) < 1, and 0 < Pr(A1iǀA0i, Vi, X1i, Y1i) < 1.

Estimating Average Causal Treatment Effect

Considering the 1-stage setting, GPMatch fit the outcome Y by a marginal structural model:

Yi=f(v)+Aiτ(x)+εi,
(1)

for i = 1, …, n, where f(.) ∼ GP(0,.), and εiiidN(0,σ02). Without loss of generality, we assume that covariates X are a subset of V. We may let τ(x)=xiTβ, which allows for an estimation of the conditional average causal treatment effect (CATE) given X. Letting Y = (Y1, …, Yn), we may rewrite equation 1 by a multivariate representation of

Y|X,V,βMVN(m,Σ),
(2)

where m=(Ai×XiTβ)n×1, Σ=(σij)n×n, σij=K(vi,vj)+σ0δij,

K(vi,vj)=σf2exp(k=1q|vkivkj|2ϕk),
(3)

for i, j = 1, …, n. The (ϕ1, ϕ2, …, ϕq) are the length-scale parameters for each of the covariates v. The δij is the Kronecker function, δij = 1 if i = j, and 0 otherwise. The covariance function K(v, v′) = Cov(v, v′) models the covariance structure between any 2 inputs (v, v′). The 2 data points are increasingly correlated as they move closer in proximity within the covariate space.

To answer the CER questions that motivated the study, we focused on estimating the following average causal treatment effect.

  1. The ATE at stage 1 (ATE@stage1): E^(Y1(1)Y1(0)|X0)
  2. The ATE at stage 2 conditional on the past treatment assignment and the patient's responses (CATE@stage2): E^(Y2(a0,1)Y2(a0,0)|X0,X1(a0)=x1,Y1(a0)=y1)
  3. The marginal ATE (MATE) at the end point is marginalized over the intermediate responses (MATE@stage2): E(Y(a0,a1)Y(a0,a0)|X0)

Box 1 presents the GPMatch algorithm for estimating the average causal treatment effect and the potential outcomes Yi(a). Further technical details can be found in our methodology manuscript (Appendix B).

Box 1GPMatch Algorithm for 1-Time-Point Treatment Assignment

  1. Initialize τ and the covariance matrix Σ, where the GP covariance function is defined by the square exponential function of baseline covariates Xi.
  2. (Matching step) Calculate matching weight wij based on the GP covariance function, estimate Y˜i(0)=jMiwij(YjAjτ^), and A˜i=jMiwijAj, where Mi indicates matching set for the i-th unit. The weight wij = k(vj)′ Σ−1, with k(vj) = (k(vj, vi))n×1 are the matching weights, which vary from individual to individual and are determined based on the matching distance metrics as defined previously.
  3. Estimate treatment effect by solving the estimating equation i=1n(YiY˜i(0)Aiτ)(AiA˜i). If including mean function modeling, the β coefficient from the mean function is also estimated in this step the same time with τ.
  4. Update parameter estimates, including the length-scale parameters, for covariance matrix Σ^.
  5. Repeat steps 2 to 4 via Gibbs sampling.
  6. Generate posterior Markov chain Monte Carlo (MCMC) for all model parameters, then estimate the posterior of [Y^i(0),Y^i(1)|Xi] for each patient.
  7. Estimate the ATE@stage1 for all patients.

For evaluating the causal treatment effects of time-varying adaptive treatments, the GPMatch approach can be easily extended following the bayesian g-computation formula approach. Similarly, the BART41 method could be extended for the ATS. Under the causal assumptions described above, the g-computation formula factorizes the joint likelihood of all outcomes into a product of multiple conditional likelihoods of outcome models at each of the follow-up time points, given the history of treatment and covariates, up to the final study end point, for k = 1,2,…,K. Similarly, the g-computation formula can be used in conjunction with BART for the ATS.42 The estimates of the bayesian nonparametric model are used to predict the missing potential outcomes at each decision point in a sequential generative model. Thus, the potential outcomes for any given treatment history are predicted, and the ATE is estimated by the contrast between an intervention and the comparator ATS at the final study end point. Finally, the optimal ATS can be identified by maximizing the potential outcomes. Box 2 outlines the algorithm used for a 2-time-point treatment assignment.

Box 2GPMatch Algorithm for 2-Time-Point Treatment Assignment

  1. Stage 1 Modeling:
    1. Fit the GPMatch model for all the observed intermediate outcomes Xi1 immediately before the second treatment decision point. Here, Xi1 includes the outcome of interest (eg, cJADAS10 at 6 months) and other disease progression measurements (eg, active joint count [AJC], limited range of motion [LOM], and erythrocyte sedimentation rate [ESR] measures at 6 months) to assess how well patients responded to the first treatment assignment. GPMatch matches patients on their baseline variables, that is, before the first treatment assignment.
    2. Generate posterior MCMC for all model parameters, then estimate posterior of [X^i,1(0),X^i,1(1)|Stage1Data] for each patient following the 1-time-point algorithm. Save the predicted X^i,1(0), X^i,1(1) for the later g-computation step.
    3. Estimate the ATE@stage1 for intermediate outcome and for all intermediate treatment response covariate measures.
  2. Stage 2 Modeling:
    1. Fit the GPMatch model for the final outcome Yi. In the pcJIA CER study, Yi is the cJADAS10 at the 12th month. GPMatch matches patients on their baseline treatment (A0), baseline covariates (Xi0), and the treat responses (Xi1) measured at the end of the first stage. This is because the second-stage treatment assignment (A1) is determined adaptively in response to the patients' first-stage assignment, initial disease status, and responses to the initial treatment assignment. GPMatch estimates the treatment effect from the second stage.
    2. Generate posterior MCMC for all model parameters from the second stage and estimate the posterior of [Y^i(00),Y^i(01),Y^i(10),Y^i(11)|Stage2Data].
    3. Estimate the conditional ATE, CATE@stage2 for treatment outcome at the end of stage 2, conditional on the treatment history and patient responses at the end of stage 1.
  3. G-computation:
    1. Integrate out the intermediate responses, then estimate the posterior [Y^i(00),Y^i(01),Y^i(10),Y^i(11)|Xi,0].
    2. Estimate the marginal ATE, MATE@stage2 for all patients.

Simulation Studies for 1-Time-Point Treatment Assignment

Under the dual-misspecification setting, 3 sets of simulation studies with 1-time-point binary treatment assignment design were considered to (1) evaluate its frequentist performance, (2) compare with the MD matching method, and (3) compare with the widely adopted methods. Below, we present a brief summary of the results and simulation settings. The detailed results are reported in Appendix B.

Setting 1: Well-Calibrated Frequentist Performance

To calibrate with frequentist performance, we considered a single covariate, x ∼ N(0, 1). The potential outcome was generated by y(a) = ex + (1 + U) × a + U0 for a = 0, 1, where the true treatment effect was 1 + Ui for the i-th individual unit. The (U, U0) are unobserved covariates. The treatment was selected for each individual following logit (P(A = 1|X)) = −0.2 + (1.8X)1/3. The observed outcome was generated by y|x, aN(y(a),σ02). We considered 4 parameter setups that involved 4 different random errors in (1) potential outcome Y(0), U0N(0, γ02); (2) treatment effect Y(1)Y(0), U1N(0, γ12); (3) treatment probability, U2N(0, γ02); and (4) observed outcome Y, U3N(0, γ32). Setting 1 included random error in both potential outcomes Y(0) and the observed outcome, and setting 2 included random error in potential outcomes Y(0) and the treatment effect. Settings 3 and 4 add to settings 1 and 2 another random effect in the treatment propensity. The simulation study shows that the approach performed well with respect to frequentist properties (Figure 4). It performs better than the AIPTW, propensity score subclassification by quintiles, and g-estimation methods, and it performs as well as linear regression modeling with spline fit propensity score adjustment and BART (Figure 5). The additional results, including a comparison of the GPMatch results to the gold standard (Table 1 in Appendix B) and other widely used methods (Tables S1-S4, Figures S1-S4), are presented in Appendix B.

Figure 4. Distribution of the GPMatch Estimate of ATE by Different Sample Sizes Under the Single Covariate Simulation Study Setting.

Figure 4

Distribution of the GPMatch Estimate of ATE by Different Sample Sizes Under the Single Covariate Simulation Study Setting.

Figure 5. Comparisons of RMSE and MAE of the ATE Estimates by Different Methods Across Different Sample Sizes.

Figure 5

Comparisons of RMSE and MAE of the ATE Estimates by Different Methods Across Different Sample Sizes.

Setting 2: GPMatch Compared With MD Matching

To compare the performances of MD matching and GPMatch, we considered a simulation study with 2 independent covariates x1, x2, from the uniform distribution U (−2, 2), where treatment was assigned by letting AIBer(πi), where

logitπi=x1x2.

The potential outcomes were generated by

yi(a)=3+5a+x1i3,Yi|Xi,AiN(yi(Ai),1).

The true treatment effect was 5. Three different sample sizes were considered, N = 100, 200, and 400. For each setting, 100 replicates were performed, and the results were summarized.

Figure 6 presents the bias of the ATE estimate results comparing GPMatch (the horizontal short dashed line is the averaged ATE, and the 5th and 95th percentiles are the long dashed lines) with the MD match (circles are the point estimate, and vertical lines are the 95% CI estimates corresponding to different choices of caliper). The results clearly demonstrate better accuracy and efficiency using GPMatch.

Figure 6. Simulation Study Results From Comparison of GPMatch With MD Matching.

Figure 6

Simulation Study Results From Comparison of GPMatch With MD Matching.

Setting 3: Performance Under Dual Misspecification

Following the well-known Kang and Schafer dual-misspecification simulation setting,11 covariates z1, z2, z3, and z4 were independently generated from the standard normal distribution N(0, 1). Treatment was assigned by AiBer(πi), where

logitπi=zi1+0.25zi30.1zi4.

The potential outcomes were generated for a = 0, 1 by

yi(a)=210+5a+27.4zi1+13.7zi2+13.7zi3+13.7zi4,Yi|Ai,XiN(y(Ai),1).

The true treatment effect was 5. To assess the performance of the methods under dual misspecification, the transformed x1=exp(z12), x2=z21+exp(z1)+10, x3=(z1z325+0.6)3, and x4 = (z2 + z4 + 20)2 were used in the model instead of zi.

We compared GPMatch with many widely adopted causal inference methods. Here, we considered 2 different modeling strategies. In GPMatch1, the mean function included only the treatment effect. In GPMatch2, the mean function also included the X1X4. The results (Figure 7) suggest that GPMatch clearly outperformed other methods.

Figure 7. RMSE and MAE of ATE Estimates Using Different Methods Under the Kang and Schafer Simulation Setting.

Figure 7

RMSE and MAE of ATE Estimates Using Different Methods Under the Kang and Schafer Simulation Setting.

Simulation Studies for ATS

Because the primary goal of the study was to evaluate the performance of 2-stage BART and GPMatch for estimating the ATE for ATS, 4 simulation designs were considered a SMART, where the treatment is randomly assigned at the first stage. The second-stage treatment is assigned adaptively to the patient's responses to the previous treatment. The design also considered the HTE setting. Five sets of simulation studies are specified below.

Setting 1: SMART Nonlinear Model

This simulation resembles a SMART setting, where the initial treatment is randomly assigned (Figure 8). The stage 1 treatment has no effect on the disease progression at the end of first-stage treatment. Both models of treatment assignment at the second stage and the potential outcomes are nonlinear functions of the end of the first-stage responses L1,

XBernoulli(0.4),A0Bernoulli(0.5),L1(a0)N(0,1),L1=L1(a0)A1|L1,A0,XBernoulli(expit(0.20.2A0+L11/3))Y(a0,a1)N(2+2.5a0+3.5a1+0.5a0a13exp(L1(a0)),sd=1).

Figure 8. RMSE and MAE of ATE Estimates Using Different Methods Under a SMART Nonlinear Model Setting.

Figure 8

RMSE and MAE of ATE Estimates Using Different Methods Under a SMART Nonlinear Model Setting.

Setting 2: SMART Linear Model, Unmeasured Covariate

Following a setting used by Daniel et al,14 this simulation considered an unmeasured confounder, U0 (Figure 9). Specifically, the data are simulated according to the following setup:

U0Bernoulli(0.4),A0Bernoulli(0.5),L1(a0)Bernoulli(expit(0.25+0.3a00.2U00.05a0U0))L1=A0L1(1)+(1A0)L1(0)A1Bernoulli(expit(0.4+0.5A00.3L10.4A0L1(a0)))Y(a0,a1)N(0.250.5a00.75a1+0.2a0a1U0,0.2).

Figure 9. RMSE and MAE of ATE Estimates Using Different Methods Under a SMART Linear Model, Unmeasured Covariate Setting.

Figure 9

RMSE and MAE of ATE Estimates Using Different Methods Under a SMART Linear Model, Unmeasured Covariate Setting.

Setting 3: SMART Kang and Schafer Dual Misspecification

We extended the well-known Kang and Schafer11 simulation to a 2-stage setting (Figure 10). Like the SMART, the first treatment is assigned at random, and the outcome is a linear function of the baseline covariates Z1Z4.

Z1,Z2,Z3,Z4N(0,1),A0Bernoulli(0.5)L1(a0)N(1.5+27.4Z1+13.7Z2+0Z3+0Z43a0,1)L1=A0L1(1)+(1A0)L1(0)A1Bernoulli(expit(0.25A0+(0.1)L113+0.75(Z10.5Z2+0.25Z3+0.1Z4)))Y(a0,a1)N(210+L1(a0)+13.7Z3+13.7Z45a03a12a0a1,1).

Figure 10. RMSE and MAE of ATE Estimates Using Different Methods Under a SMART Kang and Schafer Dual-Misspecification Setting.

Figure 10

RMSE and MAE of ATE Estimates Using Different Methods Under a SMART Kang and Schafer Dual-Misspecification Setting.

As in Kang and Schafer,11 only transformed covariates are observed x1=exp(z1/2), x2=z2/(1+exp(z1))+10, x3=(z1z325+0.6)3, and x4 = (z2 + z4 + 20)2.

Setting 4: HTE SMART

It is expected that the outcome at the first stage may modify the effect of the next-stage treatment effect (Figure 11). Here, we consider a simple additive interaction effect:

XBernoulli(0.4),A0Bernoulli(0.5),L1(a0)N(0,1)A1|L1,A0,XBernoulli(expit(0.20.2A0+L11/3))Y(a0,a1)|XN(2+2.5a0+3.5a1+0.5a0a13exp(L1(a0))+a1L1(a0),sd=1).

Figure 11. RMSE and MAE of ATE Estimates Using Different Methods Under an HTE SMART Setting.

Figure 11

RMSE and MAE of ATE Estimates Using Different Methods Under an HTE SMART Setting.

Setting 5: Observational Study Adaptive Treatment Subgroup Treatment Effect

This simulation implemented a modified simulation design setting used in Schulte et al (see Figure 12).43 It considered 3-level categorical baseline covariates X = −5,0,5, with multinomial distribution, XMultinomial(13,13,13). The causal treatment effect at both stages varied by the baseline covariates.

A0Bernoulli(expit(0.30.05X))L1(a0)N(0.75X0.75a00.25a0X,1)L1=A0L1(1)+(1A0)L1(0)A1Bernoulli(expit(0.05X+0.2A00.05L10.1L1A00.01L12))Y(a0,a1)|L1(a0)N(3+0.5a0+0.4a0XL1(a0)L1(a0)2+2a1a0a1+a1L1(a0),1).

Figure 12. RMSE and MAE of ATE Estimates Using Different Methods Under an Observational Study Adaptive Treatment Subgroup Treatment Effect Setting.

Figure 12

RMSE and MAE of ATE Estimates Using Different Methods Under an Observational Study Adaptive Treatment Subgroup Treatment Effect Setting.

The simulation compared the 2-stage BART and GPMatch against existing causal inference methods reviewed in a recent article by Newsome et al,2 including a history-adjusted structural nested model, g-computation formula, and g-estimation.

The simulation results are summarized for each of the 7 causal treatment effects of ATS in both the first and second stages. All simulation results are summarized over 200 replicates. The root mean square error (RMSE) and median absolute error (MAE) are summarized over all replicates. For GPMatch and BART, the histograms of the posterior estimates are plotted.

The simulation results suggested BART and GPMatch performed consistently better than other ATS methods, and GPMatch performed better than BART under the nonlinear model setting. Additional results for ATS are reported in Appendix C.

Developing an R Shiny Online Application

The PCORI methodology standard recommends the use of statistical causal inference methods for conducting CER. Most statistical causal inference software is created by implementing PS methods or matching methods. Software packages such as R (R Foundation for Statistical Computing), Stata (StataCorp), and SAS (SAS Institute) require steep learning curves, making them less accessible to the general research community. To our knowledge, no existing software can handle complex types of treatment, such as ATS. Therefore, one of the aims of this project was to develop an online application using R Shiny, implementing the advanced bayesian nonparametric causal inference methods of GPMatch and BART. R Shiny consists of a suite of tools for creating R online applications.13 For this online application, we built a GUI at the front end, which allows users to access the full function of the statistics computational engine without requiring any knowledge of the R programming language. Our online application (https://pcats.research.cchmc.org/) is open to the public and can be accessed via a web browser without installation.

CER in Patients With pcJIA

Study Design

This observational CER study used a DMARD-naive inception cohort to provide real-world evidence of the comparative effectiveness of the early-combination vs step-up CTPs in treating children with pcJIA, as recommended by the Childhood Arthritis and Rheumatology Research Alliance (CARRA).4 The study inclusion and exclusion criteria (Table 1) and data elements (Table 2) were designed to closely follow the CARRA recommendations. Study design details are reported in Appendix A.

Table 1. Inclusion and Exclusion Criteria for CER in Patients With pcJIA.

Table 1

Inclusion and Exclusion Criteria for CER in Patients With pcJIA.

Table 2. Data Elements for CER in Patients With pcJIA.

Table 2

Data Elements for CER in Patients With pcJIA.

Data sources

The primary data source was EMRs from the Cincinnati Children's Hospital Medical Center (CCHMC), collected in routine clinical care at a pediatric rheumatology clinic between January 1, 2009, and December 31, 2017. The secondary data source was an NIH-funded prospective follow-up study that collected detailed information on these patients and their families, such as patient-reported HRQOL, Child Health Assessment Questionnaire (CHAQ) score, and other patient-reported outcomes (PROs).10 Most of the participants in this NIH-funded research study were also being cared for at the CCHMC during the same time period. Thus, the CCHMC subset of the secondary data source was used (1) as quality control data to ensure the data quality extracted from the EMR, (2) to augment PRO data, and (3) to offer data used for sensitivity analyses.

Intervention and comparator treatment

The CTPs for children with newly diagnosed pcJIA involve 3 time-varying adaptive treatment plans. The treatments at the first stage were identified based on DMARD prescriptions (nbDMARD, bDMARD, b+nbDMARD) at baseline (ie, at the time of pcJIA diagnosis). The treatment at the second stage was identified at the 6-month follow-up visit. Following the CTP, the treatment decisions at the 3- and 9-month follow-ups were also evaluated. Because most patients continue with their previous treatment assignment, the CER study considered 2-staged ATS, first at the initial diagnosis and then at the 6-month follow-up. Patients treated with the biologic-only plan were clearly older and more likely to have large joints affected; therefore, they might represent a somewhat different patient population. To maximize equipoise and verisimilitude between the compared groups, the CER study compared the early-combination vs the step-up CTP only. Patients on the biologic-only plan were excluded. The step-up CTP is currently the most commonly adopted approach. It is expected that patients who receive the early-combination plan are likely to find matching patients who receive the step-up plan. Thus, the step-up CTP was chosen as the comparator group.

Primary and secondary outcomes

The primary outcome was cJADAS10 at the 6- and 12-month follow-up visits. The cJADAS10 is a summary score derived from the physician global assessment of disease activity (range, 0-10), patient/parent global assessment of well-being (range, 0-10), and AJC truncated at 10 (Consolaro et al44), reflecting different perspectives of disease activity recorded in routine clinic care. The cJADAS10 is bounded between 0 and 30, with a higher score indicating more disease activity. The secondary outcome, HRQOL at the 12-month follow-up, was assessed using the Pediatric Quality of Life Inventory (PedsQL) generic module. The PedsQL generic total score is bounded between 0 and 100, with a higher score indicating better quality of life (QOL).45

Study conduct: modification from the proposed research plan

In the original research proposal, we planned to use both multicenter registry data and single-center EMR data for the CER study. However, due to a large amount of missing data in the multicenter registry data, we were only able to conduct the CER study using the single-center EMR data. Great efforts were spent to ensure the quality of the data extracted from the EMR.

In the original proposed aim 2, the primary outcome was the American College of Rheumatology-90 (ACR90) criteria. The ACR90 is a clinical trials outcome measure calculation based on 6 core disease activity variables: (1) ESR, a blood test measure of inflammation; (2) number of joints with LOM; (3) number of joints with swelling; (4) physician's global assessment of disease activity; (5) patient/parent's global assessment of patient overall well-being activity; and (6) CHAQ score. At the time of the proposal, it was widely accepted as the best study outcome for clinical trials. However, 2 factors compelled us to make the revision:

  • The CHAQ can take up to 10 minutes to complete, and there is a wide range of variation in child and parental proxy reports of function, so it is less likely to be collected at every clinic visit. In the existing data collected from real-world clinical encounters, the CHAQ score was almost never collected. As a result, ACR90 outcome could not be computed for most patients.
  • The JADAS is a new disease activity measure that has been recently recommended, validated, and widely adopted as the clinical outcome measure in JIA.44,46-49 This composition measure summarizes across 4 core disease activity measures: physician's (1) global assessment of disease activities, (2) patient/parent's global assessment of patient overall well-being, (3) number of joints with swelling, and (4) ESR. The JADAS measures ongoing disease activity at a single point in time and allows a comparison of disease activity between patients.46,50 The cJADAS10, with a cutoff at 10 for active joint count, is a clinical abbreviated version of the JADAS, which has been recommended and used in studies implementing observational data.47,48,51 After discussions with the stakeholder board and clinical collaborators, we revised the primary study outcome from the JADAS to the cJADAS10. This is the same outcome used in the PR-COIN project.

These revisions were communicated to and approved by a PCORI program officer.

Data Quality Assurance

EMR systems are primarily designed for clinical care purposes, rather than research. Our secondary data source captured a subset of the study cohort and collected overlapping data between the 2 treatment groups. Thus, a comparison of the abstracted data from the EMR with the quality assurance (QA) data allowed us to take several steps to ensure data quality:

  1. Identify the data in both the EMR extract data set and the QA data set to compare across both data sources.
  2. Perform manual EMR reviews if abnormalities (eg, large amount of missing information, multiple dates of diagnosis) or outliers were found. We tried to diagnose the causes of abnormalities and find solutions, with the help of the data quality team (biomedical informatics, Epic experts, and clinicians who use Epic).
  3. Implement numerous data query algorithms to generalize the findings and ensure reproducibility for each data extraction.

These efforts significantly improved the completeness, correctness, currency, plausibility, and concordance of the data quality. Details are reported in Appendix A.

Results

PCATS Online R Shiny Application

The GPMatch and extended BART methods have been implemented in an easy-to-use publicly available online application (https://pcats.research.cchmc.org/). It allows users to upload their own data and specify outcome, treatment, confounding, and prognostic variables. The outputs are presented in a table comparing the 2 treatment arms side by side on the selected confounding and prognostic variables, and ATE estimates and predicted potential outcomes are presented in both tables and figures. If the HTE option is selected, then the HTE estimates will also be presented. A detailed user's guide has been developed (Appendix E) to provide examples and step-by-step instructions for some commonly encountered CER problem settings, including continuous, multilevel, categorical, and mixed composite types of treatment, either adaptive or nonadaptive. These examples facilitate better user experiences. The process flow for the PCATS application is presented in Figure 13.

Figure 13. Process Flow for the PCATS Online Application.

Figure 13

Process Flow for the PCATS Online Application.

Overview of the PCATS Application

In the upper-left corner of the PCATS online application (Figure 14), users have an option to login using Gmail accounts, which allows them to save their projects. The PCATS application accepts data files in CSV or Excel formats. Once users upload their data (Figure 15), they can review the data, define data types of variables (numerical vs categorical), choose a treatment type (adaptive vs nonadaptive), and determine advanced parameters (ie, number of MCMC samples). The models for ATS (Figure 16B) are different from those for nonadaptive treatment strategies (Figure 16A) because users need to specify the outcome and covariates for each stage in the models. While building the model, users can also select factor(s) to test for HTEs. Once users complete building their models and click “Run Model,” the PCATS application will analyze the data using the GPMatch, extended BART, or BART methods depending on the types of outcomes and treatment. The analysis results are presented in tables and interactive figures within the online application. An example of the outputs is presented in Figure 17. For step-by-step instructions on how to use the PCATS application, please refer to Appendix E.

Figure 14. PCATS Application Webpage.

Figure 14

PCATS Application Webpage.

Figure 15. Review Data Using the PCATS Application.

Figure 15

Review Data Using the PCATS Application.

Figure 16. Build Statistical Models Using the PCATS Application.

Figure 16

Build Statistical Models Using the PCATS Application.

Figure 17. Example of Analysis Outputs Using the PCATS Application.

Figure 17

Example of Analysis Outputs Using the PCATS Application.

CER in Patients With pcJIA

Patient Eligibility Screening

Out of 1750 patients with JIA captured in the EMR, only 530 patients met the eligibility criteria of being aged 1 to 19 years, newly diagnosed, DMARD naive, and diagnosed with pcJIA. Out of these 530 eligible patients, 47 patients had at least 1 of the specified comorbid conditions (ie, celiac disease, trisomy 21, and inflammatory bowel disease [IBD]) and were excluded. Additionally, 76 patients were excluded because they were following a biologic-only CTP. As described earlier, this is for the purpose of equipoise and verisimilitude. Detailed patient eligibility screening is summarized in Figure 18.

Figure 18. Eligibility Screening in Patients With JIA.

Figure 18

Eligibility Screening in Patients With JIA.

Primary Outcome: cJADAS10 at 6 Months and 12 Months

Results from the cJADAS10 were collected from 194 (65.55%) and 91 (73.39%) patients in the step-up and early-combination groups at baseline, respectively (Table 3). Because of retrospective EMR data collection, the number of missing cJADAS10 results was higher at follow-up visits than at baseline. Missing imputation was performed and is presented in Appendix D.

Table 3. Nonmissing cJADAS10 Results at Study Visits.

Table 3

Nonmissing cJADAS10 Results at Study Visits.

After 6 months of treatment, compared with the step-up CTP, the patients participating in the early-combination CTP demonstrated a greater reduction in disease activity. The GPMatch result suggested that both CTPs were effective in improving data activities. It predicted that the expected mean ± SD cJADAS10 scores at 6 months would be 6.7 ± 0.48 points and 4.7 ± 0.66 points for the step-up and early-combination approaches, respectively. The early-combination CTP led to a significantly greater reduction in cJADAS10 scores, at −1.98 (95% CI, −3.55 to −0.40).

The causal inference analyses estimate the potential outcomes for each of the 4 ATS. Depending on the disease progression at the end of the first-line treatment, the medication might be adjusted, and then the cJADAS10 outcome was measured at the end of the second-line treatment. After 6 months of treatment, patients on the step-up CTP might escalate treatment (ie, escalate their initial nbDMARD, change to a different nbDMARD, or add a bDMARD); alternatively, they might de-escalate/maintain initial treatment (ie, stop the DMARD or continue taking the same prescription). Patients on the early-combination CTP might escalate/maintain the initial treatment approach (ie, change DMARDs or continue taking the same combination of DMARDs), or they might de-escalate treatment (ie, stop the bDMARD and/or nbDMARD). The GPMatch method estimated the posterior distribution of cJADAS10 outcome and also whether the patient had demonstrated improvement at each ATS. The GPMatch-predicted results (mean [95% CI]) of potential cJADAS10 outcomes are presented in Figure 19.

Figure 19. Estimated cJADAS10 Outcome Using the GPMatch Method.

Figure 19

Estimated cJADAS10 Outcome Using the GPMatch Method.

Overall, the results suggested both CTPs were effective in improving disease activity (Figure 19). Early-combination treatment, on average, produced a significant 2-point reduction in the cJADAS10, with an ATE of −1.98 (95% CI, −3.55 to −0.40) by 6 months, which was sustained up to 12 months. The study did not identify HTE by JIA subtypes or baseline cJADAS10.

Secondary Outcome: PedsQL at 12 Months

Due to the limited data available for the PedsQL generic scores recorded in the EMR, the ATE of 5.6 (95% CI, −3.9 to 15.1) is associated with large variance. The study estimated expected potential scores of 74.8 and 80.4 by the end of 12 months if treated with the step-up and early combination, respectively.

There were 182 patients with PedsQL generic scores at baseline, 46 patients with PedsQL generic scores at 6 months of follow-up, and 117 patients with PedsQL generic scores at 12 months of follow-up. Because patients were asked to complete the PedsQL generic module on an annual basis, only 9 patients had both baseline and 6-month scores. Given the large amount of missing data for the 6-month outcome, comparative effectiveness analyses could only be performed for the 12-month PedsQL outcome on the first line of treatment. Missing data were handled assuming MAR mechanisms, and the GPMatch result suggested that both CTPs were effective in improving the PedsQL score, reporting 74.8 ± 2.0 and 80.4 ± 3.7 by 12 months if treated on the step-up CTP and early-combination CTP, respectively. The GPMatch method estimated a treatment effect of a 5.6-point difference (95% CI, −3.89 to 15.12 points) in PedsQL generic scores in a comparison of early-combination CTP with step-up CTP. Even though there was no statistically significant difference between groups, patients' QOL had significant improvement from baseline for both CTPs: early combination, 12.95 (95% CI, 5.6-20.09); step up 7.34 (95% CI, 3.41-11.54). Consistent results were found in the sensitivity analyses (Appendix D).

Discussion

Summary of the Project

This project is motivated by the need to evaluate the comparative effectiveness of ATS used routinely in treating patients with chronic or prolonged disease conditions. We recognized some important limitations and a need to improve statistical causal inference methods for ATS. These limitations may threaten the validity of the CER results, which could lead to inconsistent, confusing, or even misinformed conclusions.

The results of this project offered some important improvements to statistical causal inference methods. First, we have developed a novel full bayesian DR causal inference method, GPMatch.36 This method can be considered an extension to the bayesian marginal structural model. By using GP covariance functions as a matching tool, GPMatch combines the virtue of matching and bayesian nonparametric modeling within a single step of the full bayesian framework. Requiring relatively weaker causal assumptions, the GPMatch method produces DR average causal treatment effects, enjoys well-calibrated frequentist properties, and outperforms many existing causal inference methods under the most realistic dual-misspecification setting (ie, neither knowledge of the true outcome-generating process nor treatment selection is known). Appendices B and C provide detailed results.

Second, the widely adopted 3 causal assumptions initially proposed for the theory of PS are widely acknowledged as overly strong.52,53 Alternatively, DAGs15 and other frameworks have been proposed, such as in Iacus et al.16 Here, we presented the causal assumptions with a DAG within a counterfactual world setting. The DAG explicitly acknowledges 4 sources of potential unmeasured confounders and therefore presents a set of relatively weaker versions of causal assumptions that are more suitable for real-world data. For example, we relaxed the SUTVA to SUTVEA. Instead of requiring the observation to be an exact copy of the corresponding potential outcomes, that is, Yi=AiYi(1)+(1Ai)Yi(0), GPMatch allows a noisy version of the potential outcome, that is, Yi=AiYi(1)+(1Ai)Yi(0)+εi. When it comes to real-world data, it is almost always true that the outcome measures are subject to measurement error, and that the treatment effect could differ by some unobserved factors, such as presurgery procedures, timing and seasonal effects, concomitant medications, and food or drink taken together with the medications. Relaxing the causal assumptions and explicitly acknowledging the unmeasured source of random effect helps ensure the validity of the results.

Third, we have created a GUI application, PCATS (https://pcats.research.cchmc.org/), implementing the GPMatch and BART methods for CER. This R Shiny application allows users to upload their data, provides step-by-step GUI instructions, and reports tables and figures of estimates of ATE and the predicted potential outcomes. The figures are interactive, allowing users to hover their cursor over any point of interest, and the application will provide the corresponding probability estimates. The application is open to the general public, offering solutions to some important challenges in CER: (1) complex types of treatment schedules such as ordinal, mixed types of categorical and ordinal, and time-varying adaptive or nonadaptive treatment; (2) conditional causal treatment effects; and (3) bounded summary score outcomes, such as patient-reported HRQOL and many other patient-reported and clinical summary score outcomes. With the PCATS app, researchers can perform CER using state-of-the-art bayesian causal inference methods without the need to learn R or other data analytic languages. Therefore, it facilitates more timely, rigorous, and reproducible CER.

Last, the second aim of this project evaluated the effectiveness of the early-combination CTP in children with newly diagnosed pcJIA, compared with the more conventional approach of the step-up CTP, on clinical and QOL outcomes at 6 and 12 months of treatment. To our knowledge, this is the first study that applies causal inference methods to evaluate the comparative effectiveness of early-combination vs step-up CTPs for patients with pcJIA using EMR data.

As a full bayesian method, GPMatch is highly flexible and versatile and can be extended to consider many additional challenges inherent to analyzing real-world data. The online application makes available these advanced bayesian causal inference methods that are directly applicable to CER. In conclusion, the project has made a significant contribution to methods improvement for CER, and it has the potential to contribute significantly to the advancement of evidence-based health care.

GPMatch Method

The proposed GPMatch method offers a full bayesian causal inference approach that can effectively address the unique challenges inherent in causal inference. First, using GP prior covariance function to model the covariance of the observed data, GPMatch can estimate missing potential outcomes much like matching methods do, while avoiding the pitfalls of many matching methods. No data are discarded, and no arbitrary caliper is required. Instead, the model allows empirical estimation of the length-scale and variance parameters. The squared-exponential covariance function of GP prior offers an alternative distance metric, which closely resembles MD. It matches individuals by the degree of matching proportional to the squared-exponential distance without requiring specification of caliper. For this reason, GPMatch can use data more effectively than usual matching procedures can. Different length-scale parameters are considered for different covariates used in defining squared-exponential covariance function. This allows the data to select the most important covariates to be matched on and acknowledges that some variables are more important than others. Although the idea of using GP prior for bayesian causal inference is not new, using the GP covariance function as a matching device is a unique contribution of this study. The matching utility of GP covariance function is presented analytically by considering a setting when the matching structure is known. We show that GPMatch enjoys DR properties in the sense that it correctly estimates the ATE when 1 of the following conditions is true: (1) The mean function of the GPMatch correctly specifies the prognostic function of the potential outcome; or (2) the GP prior covariance function correctly specifies matching structure. We show that GPMatch estimates the treatment effect by inducing independence between 2 residuals: the residual from the treatment propensity estimate and the residual from the outcome estimate, much like the g-estimation method. Unlike the 2-staged g-estimation, the estimations of the parameters in covariance function and the mean function for the GPMatch are performed simultaneously. Therefore, the GPMatch regression approach, which can integrate the benefits of the regression model and matching method, offers a natural way for bayesian causal inference to address challenges unique to causal inference. The robust and efficient proprieties of GPMatch are well supported by the simulation results designed to reflect the most realistic settings (ie, no knowledge of the matching or functional form of the outcome model is available).36

Limitations

GP regression is a very flexible modeling technique, but it is computationally expensive. The time cost associated with GP regression increases with the increased sample size (n) at an n3 rate; thus, it can be challenging with large sample sizes. The bayesian Gibbs sampling algorithm we used makes it even more demanding in computational resources. Some authors have offered solutions by applying GP to large data sets, such as Banerjee et al.56 Alternatively, one may consider using bayesian kernel regression as an approximation. Our simulation studies had relatively small numbers of covariates and numbers of observations. Future studies are needed to improve the computational efficiency for larger data sets and higher-dimensional data.

In this project, we focused on evaluating the ATE at each decision point (both nested and marginal). The model could be used to estimate conditional treatment effects given specified treatment modifiers, such as disease subtype. Thus, when the treatment modifiers are known, the model could be used to identify optimal treatment decisions for a patient at any given decision point. To evaluate the model performances under HTEs, more simulation studies are required. Future studies should investigate the performance of GPMatch for identifying optimal individual-level treatment effects compared with A-learning, Q-learning, and other reinforcement learning methods.55

CER in Patients With pcJIA

In this study, the comparative effectiveness of the early-combination CTP in children with newly diagnosed pcJIA was compared with the more conventional approach of the step-up CTP on clinical and QOL outcomes after 6 and 12 months of treatment. To our knowledge, this is the first study that applies causal inference methods to evaluate comparative effectiveness of early-combination vs step-up CTP using EMR data. Within an established EMR system, such interactions could be tracked from the first date of diagnosis throughout the course of disease progression and treatment, particularly for patients with chronic conditions. Therefore, it is an invaluable data source for evaluating the effectiveness of alternative treatment choices, understanding potential HTE, and subsequently guiding evidence-based treatment decisions. This study demonstrates that the EMR could be used for better understanding treatment effectiveness.

Limitations and Generalizability

This study design had some limitations. First, the treatments were determined by medication prescriptions recorded in the EMR. Records of actual medication dispensing and treatment adherence were not available. The treatment effect may vary by medication dose, formulation, and route. Such information could not be considered in the current study due to the limitations of the EMR data. Second, patients in routine clinical care do not necessarily follow the predetermined schedule of follow-up, making it challenging to evaluate the CER at given time points. In this study, most patients stayed continued using the same treatment for up to 6 months before changing to a different treatment, which allowed us to emulate a clinical trial with quarterly follow-up visits based on their treatment courses and recorded clinic visits. Third, the number of missing primary outcomes had increased at study visits, which could have biased the estimates without addressing the missing data issue. Thus, we handled the missing data discreetly with imputation using a hierarchically coupled mixture model with a local dependence structure method. Furthermore, the missing primary outcomes at designated time points were assigned using the clinic records from the closest visits within a 1-month window. The analysis results might be sensitive to the specification of the time window. Additional sensitivity analyses may consider varying time windows and other decision rules regarding the EMR data for research purposes. Although this study involved sensitivity analyses evaluating the effect of not accounting for QOL measures at baseline as a treatment-by-indication confounder, other unmeasured confounders may exist. Our sensitivity analysis results suggested that the inclusion of additional covariates in these causal inference analyses resulted in nearly identical results as the primary analyses.

This study was also limited to only using data from a single medical center. Patients from different centers may represent somewhat different populations in their demographics and disease subtypes. Clinicians from different centers may also engage in different practices in treatment assignment. However, our study did not find significant subgroup treatment effects, suggesting that the results are generalizable. Physician global assessment and patient/parent global assessment of well-being could be subject to individual and center variations; thus, the effect size may differ by clinical centers. Future studies should consider using multicenter data.

Conclusions

Method Development

GPMatch offers robust, accurate, and efficient estimation of ATE. As a full bayesian approach, it is flexible and general and can be extended to address more complex data types and structures inherent in real-world settings. In this study, the model has been extended to a time-varying ATS setting, with consideration of time-dependent confounding. An online application implemented BART and GPMatch for both adaptive and nonadaptive treatments, making the advanced bayesian causal inference method directly accessible to the general research community. Further development of the GPMatch method and the application can address additional challenges, such as multilevel or cluster data structure, missing data, and HTE.

CER in Patients With pcJIA

This observational study offers the first real-world evidence of CTPs. We found that both early-combination and step-up CTPs are effective in reducing disease activity, and that early-combination CTP is more effective, leading to better disease activity scores (ie, cJADAS10) after 6 and 12 months of treatment. This finding signifies the need for future studies to investigate the comparative effectiveness of CTPs on long-term outcomes such as cJADAS10, inactive disease, and HRQOL.

References

1.
Malla L, Perera-Salazar R, McFadden E, Ogero M, Stepniewska K, English M. Handling missing data in propensity score estimation in comparative effectiveness evaluations: a systematic review. J Comp Eff Res. 2018;7(3):271-279. doi:10.2217/cer-2017-0071 [PMC free article: PMC6478118] [PubMed: 28980833] [CrossRef]
2.
Newsome SJ, Keogh RH, Daniel RM. Estimating long-term treatment effects in observational data: a comparison of the performance of different methods under real-world uncertainty. Stat Med. 2018;37(15):2367-2390. doi:10.1002/sim.7664 [PMC free article: PMC6001810] [PubMed: 29671915] [CrossRef]
3.
Wallace MP, Moodie EEM. Personalizing medicine: a review of adaptive treatment strategies. Pharmacoepidemiol Drug Saf. 2014;23(6):580-585. doi:10.1002/pds.3606 [PubMed: 24700536] [CrossRef]
4.
Ringold S, Weiss PF, Colbert RA, et al. Childhood Arthritis and Rheumatology Research Alliance consensus treatment plans for new-onset polyarticular juvenile idiopathic arthritis. Arthritis Care Res (Hoboken). 2014;66(7):1063-1072. doi:10.1002/acr.22259 [PMC free article: PMC4467832] [PubMed: 24339215] [CrossRef]
5.
Beukelman T, Kimura Y, Ilowite NT, et al. The new Childhood Arthritis and Rheumatology Research Alliance (CARRA) registry: design, rationale, and characteristics of patients enrolled in the first 12 months. Pediatr Rheumatol Online J. 2017;15(1):30. doi:10.1186/s12969-017-0160-6 [PMC free article: PMC5392971] [PubMed: 28416023] [CrossRef]
6.
Thierry S, Fautrel B, Lemelle I, Guillemin F. Prevalence and incidence of juvenile idiopathic arthritis: a systematic review. Joint Bone Spine. 2014;81(2):112-117. doi:10.1016/J.JBSPIN.2013.09.003 [PubMed: 24210707] [CrossRef]
7.
Ravelli A, Consolaro A, Horneff G, et al. Treating juvenile idiopathic arthritis to target: recommendations of an international task force. Ann Rheum Dis. 2018;77(6):2018-213030. doi:10.1136/annrheumdis-2018-213030 [PubMed: 29643108] [CrossRef]
8.
Stoll ML, Cron RQ. Treatment of juvenile idiopathic arthritis: a revolution in care. Pediatr Rheumatol Online J. 2014;12:13. doi:10.1186/1546-0096-12-13 [PMC free article: PMC4003520] [PubMed: 24782683] [CrossRef]
9.
Kessler EA, Becker ML. Therapeutic advancements in juvenile idiopathic arthritis. Best Pract Res Clin Rheumatol. 2014;28(2):293-313. doi:10.1016/J.BERH.2014.03.005 [PubMed: 24974064] [CrossRef]
10.
Seid M, Huang B, Niehaus S, Brunner HI, Lovell DJ. Determinants of health-related quality of life in children newly diagnosed with juvenile idiopathic arthritis. Arthritis Care Res (Hoboken). 2014;66(2):263-269. doi:10.1002/acr.22117 [PMC free article: PMC5264493] [PubMed: 23983144] [CrossRef]
11.
Kang JDY, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007;22(4):523-539. doi:10.1214/07-STS227 [PMC free article: PMC2397555] [PubMed: 18516239] [CrossRef]
12.
Gutman R, Rubin DB. Estimation of causal effects of binary treatments in unconfounded studies. Stat Med. 2015;34(26):3381-3398. doi:10.1002/sim.6532 [PMC free article: PMC4782596] [PubMed: 26013308] [CrossRef]
13.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41-55.
14.
Daniel RM, Cousens SN, De Stavola BL, Kenward MG, Sterne JAC. Methods for dealing with time-dependent confounding. Stat Med. 2013;32(9):1584-1618. doi:10.1002/sim.5686 [PubMed: 23208861] [CrossRef]
15.
Pearl J. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press; 2009.
16.
Iacus SM, King G, Porro G. A theory of statistical inference for matching methods in causal research. Polit Anal. 2019;27(1):46-68.
17.
Berry DA. Bayesian approaches for comparative effectiveness research. Clin Trials. 2012;9(1):37-47. doi:10.1177/1740774511417470 [PMC free article: PMC4314707] [PubMed: 21878446] [CrossRef]
18.
Hirano K, Imbens GW, Rubin DB, Zhou X-H. Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics. 2000;1(1):69-88. [PubMed: 12933526]
19.
Zajonc T. Bayesian inference for dynamic treatment regimes: mobility, equity, and efficiency in student tracking. JAMA. 2012;107(497):80-92. doi:10.1080/01621459.2011.643747 [CrossRef]
20.
Imbens GW, Rubin DB. Bayesian inference for causal effects in randomized experiments with noncompliance. Ann Stat. 1997;25(1):305-327.
21.
Baccini M, Mattei A, Mealli F. Bayesian inference for causal mechanisms with application to a randomized study for postoperative pain control. Biostatistics. 2017;18(4):605-617. doi:10.1093/biostatistics/kxx010 [PubMed: 28369188] [CrossRef]
22.
Hill JL. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat. 2011;20(1):217-240. doi:10.1198/jcgs.2010.08162 [CrossRef]
23.
Hill J, Su Y-S. Assessing lack of common support in causal inference using bayesian nonparametrics: implications for evaluating the effect of breastfeeding on children's cognitive outcomes. Ann Appl Stat. 2013;7(3):1386-1420. doi:10.1214/13-AOAS630 [CrossRef]
24.
Gustafson P. Double-robust estimators: slightly more bayesian than meets the eye? Int J Biostat. 2012;8(2). doi:10.2202/1557-4679.1349 [PubMed: 22499730] [CrossRef]
25.
Cefalu M, Dominici F, Arvold N, Parmigiani G. Model averaged double robust estimation. Biometrics. 2017;73(2):410-421. doi:10.1111/biom.12622 [PMC free article: PMC5466877] [PubMed: 27893927] [CrossRef]
26.
Zigler CM, Dominici F. Uncertainty in propensity score estimation: bayesian methods for variable selection and model averaged causal effects. J Am Stat Assoc. 2014;109(505):95-107. doi:10.1080/01621459.2013.869498 [PMC free article: PMC3969816] [PubMed: 24696528] [CrossRef]
27.
Roy J, Lum KJ, Daniels MJ. A bayesian nonparametric approach to marginal structural models for point treatments and a continuous or survival outcome bayesian nonparametric MSM. Biostatistics. 2017;18(1):32-47. doi:10.1093/biostatistics/kxw029 [PMC free article: PMC5255048] [PubMed: 27345532] [CrossRef]
28.
Xu Y, Müller P, Wahed AS, Thall PF. Bayesian nonparametric estimation for dynamic treatment regimes with sequential transition times. J Am Stat Assoc. 2016;111(515):921-935. doi:10.1080/01621459.2015.1086353 [PMC free article: PMC5175473] [PubMed: 28018015] [CrossRef]
29.
Roy J, Lum KJ, Zeldow B, Dworkin JD, Lo Re V III, Daniels MJ. Bayesian nonparametric generative models for causal inference with missing at random covariates. Biometrics. 2018;74(4):1193-1202. doi:10.1111/BIOM.12875 [PMC free article: PMC7568223] [PubMed: 29579341] [CrossRef]
30.
Ding P, Li F. Causal inference: a missing data perspective. Stat Sci. 2018;33(2):214-237. doi:10.1214/18-STS645 [CrossRef]
31.
Dawid AP. Causal inference without counterfactuals (with discussion). J Am Stat Assoc. 2000;95(450):407-424. doi:10.1080/01621459.2000.10474210 [CrossRef]
32.
Hahn PR, Carvalho CM, Puelz D, He J. Regularization and confounding in linear regression for treatment effect estimation. Bayesian Anal. 2018;13(1):163-182. doi:10.1214/16-BA1044 [CrossRef]
33.
McCandless LC, Douglas IJ, Evans SJ, Smeeth L. Cutting feedback in bayesian regression adjustment for the propensity score. Int J Biostat. 2010;6(2):Article 16. doi:10.2202/1557-4679.1205 [PubMed: 21972431] [CrossRef]
34.
Zigler CM, Watts K, Yeh RW, Wang Y, Coull BA, Dominici F. Model feedback in bayesian propensity score estimation. Biometrics. 2013;69(1):263-273. doi:10.1111/j.1541-0420.2012.01830.x [PMC free article: PMC3622139] [PubMed: 23379793] [CrossRef]
35.
Saarela O, Belzile LR, Stephens DA. A bayesian view of doubly robust causal inference. Biometrika. 2016;103(3):667-681. doi:10.1093/biomet/asw025 [CrossRef]
36.
Huang B, Chen C, Liu J. GPMatch: a bayesian doubly robust approach to causal inference with Gaussian process covariance function as a matching tool. arXiv. Preprint posted online January 29, 2019. https://arxiv​.org/abs/1901.10359
37.
Stuart EA. Matching methods for causal inference: a review and a look forward. Stat Sci. 2010;25(1):1-21. doi:10.1214/09-STS313 [PMC free article: PMC2943670] [PubMed: 20871802] [CrossRef]
38.
King G, Nielsen R. Why propensity scores should not be used for matching. Polit Anal. 2019;27(4):435-454.
39.
Choi T, Woo Y. On asymptotic properties of bayesian partially linear models. J Korean Stat Soc. 2013;42(4):529-541.
40.
Choi T, Schervish MJ. On posterior consistency in nonparametric regression problems. J Multivar Anal. 2007;98(10):1969-1987. doi:10.1016/j.jmva.2007.01.004 [CrossRef]
41.
Hill JL. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat. 2011;20(1):217-240. doi:10.1198/jcgs.2010.08162 [CrossRef]
42.
Keil AP, Daza EJ, Engel SM, Buckley JP, Edwards JK. A bayesian approach to the g-formula. Stat Methods Med Res. 2018;27(10):3183-3204. doi:10.1177/0962280217694665 [PMC free article: PMC5790647] [PubMed: 29298607] [CrossRef]
43.
Schulte PJ, Tsiatis AA, Laber EB, Davidian M. Q-and A-learning methods for estimating optimal dynamic treatment regimes. Stat Sci. 2014;29(4):640-661. doi:10.1214/13-STS450 [PMC free article: PMC4300556] [PubMed: 25620840] [CrossRef]
44.
Consolaro A, Giancane G, Schiappapietra B, et al. Clinical outcome measures in juvenile idiopathic arthritis. Pediatr Rheumatol Online J. 2016;14(1):23. doi:10.1186/s12969-016-0085-5 [PMC free article: PMC4836071] [PubMed: 27089922] [CrossRef]
45.
Varni JW, Burwinkle TM, Seid M, Skarr D. The PedsQL 4.0 as a pediatric population health measure: feasibility, reliability, and validity. Ambul Pediatr. 2003;3(6):329-341. doi:10.1367/1539-4409(2003)003<0329:TPAAPP>2.0.CO;2 [PubMed: 14616041] [CrossRef]
46.
Consolaro A, Ruperto N, Bazso A, et al. Development and validation of a composite disease activity score for juvenile idiopathic arthritis. Arthritis Rheum. 2009;61(5):658-666. doi:10.1002/art.24516 [PubMed: 19405003] [CrossRef]
47.
McErlane F, Beresford MW, Baildam EM, et al. Validity of a three-variable Juvenile Arthritis Disease Activity Score in children with new-onset juvenile idiopathic arthritis. Ann Rheum Dis. 2013;72(12):1983-1988. doi:10.1136/annrheumdis-2012-202031 [PMC free article: PMC3841758] [PubMed: 23256951] [CrossRef]
48.
Swart JF, van Dijkhuizen EHP, Wulffraat NM, de Roock S. Clinical Juvenile Arthritis Disease Activity Score proves to be a useful tool in treat-to-target therapy in juvenile idiopathic arthritis. Ann Rheum Dis. 2018;77(3):336-342. doi:10.1136/annrheumdis-2017-212104 [PMC free article: PMC5867401] [PubMed: 29138257] [CrossRef]
49.
Bulatovic Calasan M, de Vries LD, Vastert SJ, Heijstek MW, Wulffraat NM. Interpretation of the Juvenile Arthritis Disease Activity Score: responsiveness, clinically important differences and levels of disease activity in prospective cohorts of patients with juvenile idiopathic arthritis. Rheumatology (Oxford). 2014;53(2):307-312. doi:10.1093/rheumatology/ket310 [PubMed: 24162034] [CrossRef]
50.
McErlane F, Beresford MW, Baildam EM, Thomson W, Hyrich KL. Recent developments in disease activity indices and outcome measures for juvenile idiopathic arthritis. Rheumatology (Oxford). 2013;52(11):1941-1951. doi:10.1093/rheumatology/ket150 [PubMed: 23630368] [CrossRef]
51.
Otten MH, Anink J, Prince FH, et al. Trends in prescription of biological agents and outcomes of juvenile idiopathic arthritis: results of the Dutch national Arthritis and Biologics in Children Register. Ann Rheum Dis. 2015;74(7):1379-1386. doi:10.1136/annrheumdis-2013-204641 [PubMed: 24641940] [CrossRef]
52.
Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978;6(1):34-58. doi:10.1016/S0169-7161(05)25001-0 [CrossRef]
53.
Cole SR, Frangakis CE. The consistency statement in causal inference: a definition or an assumption? Epidemiology. 2009;20(1):3-5. doi:10.1097/EDE.0b013e31818ef366 [PubMed: 19234395] [CrossRef]
54.
Berger JO, De Oliveira V, Sansó B. Objective bayesian analysis of spatially correlated data. J Am Stat Assoc. 2001;96(456):1361-1374.
55.
Chakraborty B, Moodie EE. Statistical Methods for Dynamic Treatment Regimes. Springer New York; 2013.
56.
Banerjee S, Gelfand AE, Finley AO, Sang H. Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B Stat Methodol. 2008;70(4):825-848. https://doi​.org/10.1111/j​.1467-9868.2008.00663.x [PMC free article: PMC2741335] [PubMed: 19750209]

Related Publications

•.
Sivaganisan S, Müller P, Huang B. Subgroup finding via Bayesian additive regression trees. Stat Med. 2017;36(15):2391-2403. doi:10.1002/sim.7276 [PubMed: 28276142] [CrossRef]
•.
Huang B, Szczesniak R. Can't see the wood for the trees: confounders, colliders and causal inference—a statistician's approach. Thorax. 2019; 74(4):323-325. doi:10.1136/thoraxjnl-2018-212780 [PubMed: 30733328] [CrossRef]
•.
Huang B, Chen C, Liu J. GPMatch: a bayesian doubly robust approach to causal inference with Gaussian process covariance function as a matching tool. arXiv. Preprint posted online January 29, 2019. https://arxiv​.org/abs/1901.10359
•.
Huang B, Qiu T, Chen C, et al. Timing matters: real-world effectiveness of early combination of biologic and conventional synthetic disease-modifying antirheumatic drugs for treating newly diagnosed polyarticular course juvenile idiopathic arthritis. RMD Open. 2020;6(1):e001091. doi:10.1136/rmdopen-2019-001091 [PMC free article: PMC7003379] [PubMed: 32396520] [CrossRef]
•.
Huang B, Morgan E, Chen C, et al. Comparing effectiveness of early initiation of biologic treatment for newly diagnosed juvenile idiopathic arthritis using a novel statistics causal inference method applied to observational data. Poster presented at: 2017 ACR/ARHP Annual Meeting; November 7, 2017; San Diego, CA.
•.
Huang B, Liu J, Chen C, Sivaganisan S. Comparative effectiveness study of a new bayesian's causal inference method. Value Health. 2017;20(9):PA769. https://doi​.org/10.1016/j​.jval.2017.08.2200
•.
Huang B, Morgan E, Chen C, et al. Is early initiation of biologic treatment more effective than a step up treatment approach for new onset juvenile idiopathic arthritis?—using a causal inference approach to achieve a double robust estimate. Value Health. 2016;19(7):PA394. https://doi​.org/10.1016/j​.jval.2016.09.268
•.
Huang B, Qiu T, Chen C, et al. Comparative effectiveness research using electronic health records data: ensure data quality. In SAGE Res Methods Cases. 2020. https://www​.doi.org/10​.4135/9781529726480
•.
Performance of different causal inference methods in comparative effectiveness study of early aggressive treatment in polyarticular course of juvenile idiopathic arthritis. In preparation.

Acknowledgments

The research herein was funded through a PCORI award (ME-1408-19894) and a Process and Method award from the Center for Clinical and Translational Science and Training, the National Center for Advancing Translational Sciences of the NIH, under award number 5UL1TR001425-03.

Research reported in this report was funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (#ME-1408-19894). Further information available at: https://www.pcori.org/research-results/2015/new-statistical-methods-compare-effectiveness-adaptive-treatment-plans

Appendices

Appendix A.

PCATS Study Protocol (PDF, 1.0M)

Appendix B

PCATS Aim 1a: GPMatch Methodology Manuscript (PDF, 905K)

Supplemental Tables and Figures to “GPMatch: A Bayesian Doubly Robust Approach to Causal Inference with Gaussian Process Covariance Function As a Matching Tool”

Table S1. Results of ATE Estimates from the Single Covariate Simulation Study Setting 1: (PDF, 160K)

{γ0,γ1,γ2,γ3}={0.5,0,0,0.75}

Table S2. Results of ATE Estimates from the Single Covariate Simulation Study Setting 2: (PDF, 159K)

{γ0,γ1,γ2,γ3}={1,0.15,0,0}

Table S3. Results of ATE Estimates from the Single Covariate Simulation Study Setting 3: (PDF, 158K)

{γ0,γ1,γ2,γ3}={0.5,0,0.7,0.75}

Table S4. Results of ATE Estimates from the Single Covariate Simulation Study Setting 4: (PDF, 179K)

{γ0,γ1,γ2,γ3}={1,0.15,0.7,0}

Figure S1. Comparisons of root mean square error (RMSE), and median absolute error (MAE) of the ATE Estimates by Different Methods Across Different Sample Sizes under the Simulation Setting 1: (PDF, 142K)

{γ0,γ1,γ2,γ3}={0.5,0,0,0.75}

Figure S2. Comparisons of root mean square error (RMSE), and median absolute error (MAE) of the ATE Estimates by Different Methods Across Different Sample Sizes under the Simulation Setting 2: (PDF, 94K)

{γ0,γ1,γ2,γ3}={1,0.15,0,0}

Figure S3. Comparisons of root mean square error (RMSE), and median absolute error (MAE) of the ATE Estimates by Different Methods Across Different Sample Sizes under the Simulation Setting 3: (PDF, 92K)

{γ0,γ1,γ2,γ3}={0.5,0,0.7,0.75}

Figure S4. Comparisons of root mean square error (RMSE), and median absolute error (MAE) of the ATE Estimates by Different Methods Across Different Sample Sizes under the Simulation Setting 4: (PDF, 92K)

{γ0,γ1,γ2,γ3}={1,0.15,0.7,0}

Figure S5. Distribution of the Estimated by Different Sample Sizes ATE from GPMatch under the Kang and Shafer Dual Misspecifcation Setting. Upper panel presents the results of GPMatch with the treatment effect only in the mean function model; lower panel presents the results of GPMatch with the treatment effect and the X1X4 in the mean function model. Both included X1X4 in the covariate function (PDF, 119K)

Figure S6. Distributions of key covariates in unweighted and weighted samples using inverse probability weighting of propensity scores for the case study (PDF, 109K)

Appendix E.

PCATS Online Application User Guide (PDF, 2.0M)

Institution Receiving Award: Cincinnati Children's Hospital Medical Center
Original Project Title: Patient Centered Adaptive Treatment Strategies (PCATS) Using Bayesian Causal Inference
PCORI ID: ME-1408-19894
ClinicalTrials.gov ID: NCT02524340

Suggested citation:

Huang B, Morgan EM, Chen C, et al. (2020). New Statistical Methods to Compare the Effectiveness of Adaptive Treatment Plans. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/11.2020.ME.140819894

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

Copyright © 2020. Cincinnati Children's Hospital Medical Center. All Rights Reserved.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK594557PMID: 37651558DOI: 10.25302/11.2020.ME.140819894

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (13M)

Other titles in this collection

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...