U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of Creating and Testing Methods to Estimate Treatment Effect in Observational Studies with Three or More Treatments

Creating and Testing Methods to Estimate Treatment Effect in Observational Studies with Three or More Treatments

, PhD, , PhD, , MD, , PharmD, PhD, and , PhD.

Author Information and Affiliations

Abstract

Background:

Determining the correct design and analysis of nonrandomized studies to estimate the effects of treatments is important in patient-centered outcomes research (PCOR). PCOR is meant to enable patients to make informed health care decisions based on their personal conditions, characteristics, and preferences. Increasingly, patients and physicians must choose from more than 2 treatment options. Methods based on propensity scores are popular for estimating causal effects of binary treatments from observational studies. Use of propensity score methods for more than 2 treatment options requires advanced techniques but has received limited attention in the literature.

Objectives:

This research consists of 2 objectives. The first is aimed at development, testing, and guidance for estimation of effects of multiple treatment options that are either ordinal (eg, ≥3 possible doses of a drug) or categorical (eg, ≥3 possible drugs). Specifically, we will concentrate on different matching procedures. In the second objective, we use the developed methods to estimate the effects of multiple add-on, noninsulin antihyperglycemic treatments on major adverse cardiovascular events (MACE) or death.

Methods:

Methods based on the generalized propensity score (GPS), which relates to the probabilities of receiving each of the possible treatment options, have been proposed to address estimation of causal effects with more than 2 interventions. However, the relative benefits of different GPS models remain only partially identified, and the type of exposure (ordinal vs categorical) may influence this choice. Moreover, the identification of appropriate estimation methods (eg, weighting, matching) has been inadequately investigated. We develop the theoretical background for estimating causal effects in studies with multiple interventions. In addition, we propose new matching methods and use simulation analysis to compare their bias, variance, and mean squared error with currently used methods. For the second objective, we apply the newly developed methods to observational cohort data from the 2007-2015 Clinical Practice Research Datalink (CPRD).

Results:

We provide general guidelines and describe different statistical methods that can be used to estimate treatment effects with observational data when comparing multiple interventions. In the analysis of the CPRD data set, we found that using metformin plus sulfonylureas (gliclazide) increased the 3-year risk of MACE over metformin plus thiazolidinediones (pioglitazone). In addition, the former combination increased the 3-year risk of mortality over metformin plus dipeptidyl peptidase-4 inhibitors (sitagliptin) and metformin plus thiazolidinediones.

Conclusions:

First, this study showed that simple extensions of procedures aimed at estimating the causal effects with binary treatment may produce misleading results when applied to the multiple-treatment setting. Special attention should be given to the different assumptions being made. Second, we developed multiple matching procedures and a multiple-imputation procedure. Among matching procedures, we found that matching on the Mahalanobis distance of the GPS with or without caliper provides the largest reduction in covariate bias. However, as the number of treatments increases, fuzzy matching provides the largest reduction in bias. Finally, we found that metformin plus gliclazide results in increased risk of MACE and mortality compared with metformin plus pioglitazone. In addition, metformin plus gliclazide results in increased risk of mortality compared with metformin plus sitagliptin.

Limitations:

All the procedures we examined were based on the assumption that the assignment mechanism is strongly unconfounded. When this assumption is violated, the causal estimates may be biased. Sensitivity analysis with multiple interventions is an area of future research.

Background

One objective of causal inference for patient-centered outcomes research (PCOR) is to identify which among multiple interventions affects an outcome the most. However, the standards for a causal inference methodology for PCOR in the PCORI methodology report1 is primarily based on a methodology for binary treatment (treatment A vs treatment B). This report provides guidance and describes statistically valid methods for comparing the effects of 3 or more treatments in nonrandomized (observational) studies.

Methods for comparing multiple treatments in nonrandomized studies have numerous applications across a spectrum of human diseases. We illustrate the utility of these methods in a specific study that examines clinical management of type 2 diabetes mellitus (T2DM). Convincing evidence suggests that blood glucose control in T2DM reduces the risk of microvascular complications, such as retinopathy.2 However, the impact of glucose lowering itself and the more specific effects of individual glucose-lowering medications on macrovascular complications, such as cardiovascular disease, remains unclear. A fundamental challenge in identifying the macrovascular effects of antihyperglycemic treatment (AHT) regimens is the complicated pattern of treatment involving multiple alternative agents—a problem that can be effectively addressed with the methods developed in this report.

Moreover, in observational studies, the decision whether a certain patient receives a certain treatment is generally confounded with other variables that may influence the outcome. Thus, it is difficult to attribute differences in the occurrence of outcomes to the treatment because these other variables may also influence the outcome. Suppose that a researcher is interested in examining the effect of a new, lower-cost diabetes drug compared with a more expensive, established option. It may be that people with better health insurance (typically wealthier people) would have a higher propensity to receive the more expensive treatment, whereas people who do not have extensive health insurance would have a higher propensity to receive the lower-cost treatment. In general, people who are wealthier tend to have better outcomes. Thus, if we found that the new treatment outcomes were better on average than the established treatment, we could not be certain whether this result was attributable to the higher-priced drug or to other factors related to the patients' wealth. When only wealth influences the probability of receiving a certain drug, it is straightforward to identify for each person using the new drug a comparable patient (ie, a patient with the same level of wealth) who received a different treatment. However, this task is more complicated when the number of covariates increases. The propensity score (PS)3,4—that is, the probability of receiving the treatment of interest relative to comparators given the covariates—was proposed to summarize all covariates into a scalar. This scalar can then be used to find patients with different treatments who have similar covariates on average. Although the PS is not the only methodology that reduces the dimensions from the space of all covariates into a scalar value, it has been shown in theory5,6 and in application7-11 to provide statistically valid causal inferences.

PS methods were proposed for binary treatment (ie, comparing 2 treatments or a treatment to a control). Thus, most of the literature on PS estimation and its use focuses on comparing only 2 treatments.12 This shortcoming leads researchers to dichotomize continuous, ordinal, and categorical treatments13-16 or to abandon PS methods. Dichotomizing the nonbinary treatment limits causal claims to the effects of the artificially dichotomized treatment, and it may suffer from loss of power, residual confounding, and possible bias in estimates.17

Current methods for estimating causal effects when comparing multiple treatments attempt to reduce potential bias by balancing covariates across treatment groups. In some cases, these methods are supplemented with regression adjustments to further reduce the bias. These methods may suffer from possible deficiencies such as nontransitivity of the estimated effects (eg, treatment 1 may display better outcomes than treatment 2, and treatment 2 may display better outcomes than treatment 3, but treatment 1 will have worse outcomes than treatment 3) and nonsufficient reduction in bias because of leftover covariate imbalances.

This report addresses 4 methodological and practical gaps by employing the following approaches:

  1. Develop the theoretical basis for comparing multiple treatments in nonrandomized studies.
  2. Compare the effectiveness of current methodologies for estimation of causal effects with more than 2 treatments in balancing covariates and the total bias reduction using extensive simulation analysis. All the current methodologies require the definition of certain parameters (eg, PS link function, variables to include in the PS). This report examines various settings of these parameters and provides advice to investigators on the optimal method for different configurations.
  3. Propose new methods that attempt to overcome some of the deficiencies recognized in the current methods, and describe the configurations in which 1 method is preferable to another.
  4. Apply these findings to examine the effects of noninsulin second-line AHT regimens on the occurrence of major adverse cardiovascular events (MACE) in patients with T2DM.

Participation of Patients and Other Stakeholders

The PCORI Funding Announcement for Methods Reports states that for some applications in the area of analytic methods, no patient and stakeholder plan is necessary. We believe that this project falls into this category. However, we describe how we engaged researchers as stakeholders. The goal of our dissemination plan was to give other researchers information about the methods studied and to solicit their feedback through traditional scientific channels. We anticipated that publication of our work in the professional, peer-reviewed literature would result in discussions of the methods among the core stakeholder groups: methodologists and applied researchers. Indeed, we have presented our work at academic conferences through posters, invited talks, and short courses. The feedback we received entered into our final products. In addition, we have published part of our work in various applied statistical journals and received positive feedback from other researchers. When analyzing the second-line noninsulin AHT regimens, we engaged 2 experts, Dr Robert A. Smith and Dr Andrew R. Zullo, throughout the process. Dr Smith is a highly experienced diabetes physician researcher, and Dr Zullo is a pharmacoepidemiologist whose dissertation work investigates the effects of noninsulin AHT in the elderly. Their involvement throughout the project ensured that the study team, while focused on the details of the methodology, did not lose perspective on what clinicians and patients need to make informed health care decisions.

Methods

The research strategy for this project consisted of 4 major parts. The first part involved examining the extensive literature on comparison of binary treatments in nonrandomized settings and amending it to address important issues that arise when the number of treatments increases. The second part consisted of researching currently available methods for estimating the causal effects when multiple interventions are involved and identifying possible shortcomings of the current methods. The third part included development of new matching techniques and comparing existing and novel methods in extensive simulation analysis. The last part involved analysis of the cardiovascular safety of multiple AHT classes as add-on medication to metformin in treating T2DM using the Clinical Practice Research Datalink (CPRD). In the sections that follow, we provide background on and detail the methodology implemented in each part.

Theoretical Background for Comparing Multiple Treatments in Nonrandomized Settings

Ideally, to inform patients' decisions, we would like to observe the outcomes of all possible treatments applied to patients at the same moment in time. The concept of potential outcomes for different treatments—that is, the counterfactual ideal of being able to observe the outcomes for each possible treatment simultaneously—is the heart of the Neyman-Rubin framework for causal inference.18-20 For Z possible treatments, the framework postulates that patient i has {Yi(1), …, Yi(Z)} potential outcomes. Throughout, we assume the expanded version of the Stable Unit Treatment Value Assumption (SUTVA21,22) so that this notation is functionally well defined. Using these assumptions, Table 1 depicts the observed data set for Z = 3.

Table 1. Observed Data Set for Estimation of Causal Effects for 3 Treatments.

Table 1

Observed Data Set for Estimation of Causal Effects for 3 Treatments.

To make an informed decision, patients need only compare the different Yi(t), t ∈ {1, …, Z} based on their conditions and preferences. However, because it is impossible to observe simultaneously all possible outcomes under different treatments (ie, the occurrence of myocardial infarction under treatment 1, treatment 2, etc), we must attempt to approximate the potential outcomes with contemporaneous comparisons of different people, and additional assumptions are needed. The primary additional assumption required posits that we can predict the outcome of treatment t1 for a patient who received treatment t2 (t2≠t1) by identifying patients with similar observed characteristics who received treatment t1. This assumption consists of 2 parts: first, that every patient has a positive probability of receiving each treatment, and second, that the assignment mechanism is unconfounded.22,23 Let Xi and Ti be a set of pretreatment covariates and treatment assignment indicators for unit i, respectively. The assignment mechanism, the probability of unit i receiving treatment t ∈ {t1, …, tZ}, is defined as fT|Y(t1),,Y(tZ),X(t|Yi(t1),,Yi(tZ),Xi,φ), where φ is a set of parameters. The positive probability assumption is expressed as 0<fT|Y(t1),,Y(tZ),X(Ti= t|Yi(t1),, Yi(tZ),Xi, φ)<1 and the unconfoundedness assumption as fT|Y(t1),,Y(tZ),X(Ti= t|Yi(t1),, Yi(tZ),Xi, φ)=fT|X(t|Xi, φ)=r(t,Xi).

The vector R(X) = (r(t1, Xi), …, r(tZ, Xi)) is commonly referred to as the generalized propensity score (GPS). Based on these assumptions, it is possible to estimate unbiased unit-level causal effects between those at different treatment assignments with equal R(X).24,25

Two contrasts that may be of interest with multiple treatments are

PATEt1,t2=E(Yi(t1)Yi(t2)),PATTw|t1t2=E(Yi(t1)Yi(t2)|Ti=w).

For additional discussion and generalizations of these contrasts, see Lopez and Gutman.22 For PATTw|t1t2, the choice of the reference treatment w should be based on the scientific question being investigated. For example, in some instances, one would like to compare all active treatments with a control or a standard treatment.26

Previously Proposed Methods

Four classes of methods for comparing more than 2 treatments have been discussed in the literature: GPSs,24,25,27,28 series of binary comparisons (SBCs),29-33 common referent patient matching (CRPM),34 and within-trio matching.35

GPS With k-Means Clustering

GPS methods estimate a vector of PS based on comparison of all of the treatment's categories simultaneously. The GPSs are generally unknown and are commonly estimated using multinomial logit/probit models.22 Several methods have been proposed to estimate the treatment effects when using GPS. Tu et al36 proposed k-means clustering (KMC) to create like groups using the GPS. We refer to this procedure as KMC throughout. Applications using KMC of nominal treatments are limited.

Inverse Probability Weighting

A different analytic approach is based on the inverse probability weighting (IPW) method. The inverse of the predicted probabilities from a multinomial model or a proportional odds model can be used as weights, where subjects are weighted by the reciprocal of their GPS at the treatment they received.23,37 Specifically, the weights for each unit are wt(Xi)=1/r^(t,Xi), where r^(t,Xi) is the estimated GPS for treatment t calculated using a model for multiple treatments.

Recently, 2 other weighting methods have been proposed: matching weights38 and overlap weights.39 These weights are more robust in the sense that they are less susceptible to extreme weights. This robustness is achieved by changing the estimand of interest.

Series of Binary Comparisons

SBC is a 2-stage method that uses only binary PS estimates that are calculated for 2 treatments at a time. In the first stage, one calculates the PS for each possible pair of treatments; in the second stage, units are matched within each pair. For example, let Z = 5, then SBC compares each of the 10 possible pairings of treatments independently. For each pair of treatments tj,tjtjtj{1,,5}, the binary PSs are estimated for units that received either tj or tj′. These PSs are then used for matching units within these pairs of treatments, and the treatment effect is calculated by comparing the matched groups within the pairs. This method was first proposed by Lechner,32 who compared the multinomial probit model to SBC using binary probit models. He found little difference in balance, as measured by the within-pair standardized bias, and in treatment effect estimates when comparing these models. Lechner32 advocated for the use of SBC over the multinomial probit model because doing so is less computationally intensive and less sensitive to model misspecification. SBC estimates the average treatment effect across all exposure pairs.

Common Referent Patient Matching

CRPM is also based on the binary PS method, but it attempts to create sets of patients such that each set includes 1 individual from each treatment group. For 3 treatments, {t1, t2, t3}, the steps of a CRPM approach are as follows:

  1. Pick a reference treatment, t1. Within each pair of treatments {t1, t2} and {t1, t3}, use logistic or probit regression to estimate the PS between treatments t1 and t2, ê1,2(X) and the PS between treatments t1 and t3, ê1,3(X).
  2. Match pairs of subjects with overlapping PSs on treatment t1 and t2 using ê1,2(X). Match pairs of subjects with overlapping PSs on treatments t1 and t3 using ê1,3(X).
  3. Construct 1:1:1 matched triplets using the patients on reference treatment t1 who were matched to both a patient on treatment t2 and a patient on treatment t3, along with their associated matches on treatments t2 and t3.

Matched pairs from treatments t1 and t2 for which the patient receiving the reference treatment was not matched with a subject on treatment t3 are discarded. Likewise, matched pairs of subjects on treatments t1 and t3 are discarded if there was no match for the reference subject in t2. Publications on CRPM have not specified how to use the triplets to estimate causal effects, although they refer to traditional methods of analyzing matched data sets. Canhão et al40 formed matched triplets using CRPM and estimated the effect of 3 treatments on a binary response using χ2 tests.

Within-Trio Matching

With Z = 3, Rassen et al35 proposed within-trio matching to form triplets of subjects. Within-trio matching uses the KD-tree algorithm41 to optimize triplet similarities based on units' GPSs for treatment t1 and treatment t2, using a distance function between all possible pairs of triplets.42 Using simulations, Rassen et al35 found that triplets produced using within-trio matching generally yielded lower standardized covariate bias compared with CRPM and SBC.

Limitations of Current Methods

The above-described methods may be appropriate with certain data or for a specific causal question, but each has its limitations. Subclass weighted means using GPSs may not eliminate the bias resulting from differences in covariates. Because this method does not support an adjustment for residual bias within subclasses, the estimated treatment effects are dependent on the GPS density and less likely to be consistent compared with methods using regression adjustment.43 In addition, some subclasses may not include units from all treatment groups, which will require extrapolation to that subclass.

Weighting for multiple exposures using GPS is limited because an increase in the number of treatments results in higher propensity for extreme weights. Extreme weights result in erratic estimates with large variances.44,45 Trimming units with r(t, X) that are close to 0 or 1 have been proposed for binary treatments. However, for multiple treatments, this process may drop units with different covariates' distributions, which could ultimately increase the bias.22 SBC yields causal effects conditional on a subject receiving 1 of 2 treatments, yielding a set of causal effects that are unlikely to be transitive, which means that treatment t1 may display better outcomes than treatment t2, and treatment t2 may display better outcomes than treatment t3, but treatment t1 will have worse outcomes than treatment t3. This lack of transitivity with SBC was demonstrated in theory22 and in practice.46 As a result, it is difficult to determine an optimal treatment level or to make generalizations across a population when using SBC. Similar issues may arise when using CRPM to compare treatments that do not include the reference treatment.

Newly Proposed Methods

Matching algorithms have been proposed as tools to estimate causal effects since the mid-1900s, but they have mostly been used to estimate causal effects between 2 treatment groups. We developed 2 main matching algorithms with a few variations. These variations involve the choice of distance measure, use of a caliper, and use of an initial clustering phase. We have also proposed imputation algorithms based on approximate Bayesian bootstrap (ABB47) that resemble matching algorithms and are valid for noncontinuous outcomes. The matching algorithms can be classified into 2 main types: basic matching and vector matching (VM).

Basic Matching

The basic matching with replacement algorithm identifies for unit i the units with the shortest distances from each of the other treatment groups. Because this algorithm identifies matches to all the units, some matches may not be close in terms of the distance measure. A possible solution is to restrict all matches to only those that have a distance smaller than a predefined threshold (a caliper). We summarize the algorithm for reference treatment t ∈ {t1, …, tZ} as follows:

  1. Estimate the GPS, R(Xi), i = 1,…, n using a multinomial logistic regression model.
  2. Drop units outside the common support region (see Lopez and Gutman22 for possible regions), and refit the model once.
  3. For all t′ ≠ t, match with replacement those receiving t to those receiving t′ using a prespecified distance measure and a caliper.
  4. Units receiving t that were matched to units receiving all treatments t′ ≠ t, along with their matches receiving the other treatments, make up the final matched cohort.

Vector Matching

The basic matching algorithm relies on a distance measure that aggregates individual component differences over the entire vector. In some cases, it may result in some components of the matched vector that are far apart while other components are relatively close. VM refers to a set of algorithms for matching units in observational studies with multiple treatments that addresses this limitation.22,48 We summarize the algorithm for reference treatment t ∈ {t1, …, tZ} as follows:

  1. Estimate R(Xi), i = 1,…, n using a multinomial logistic regression model.
  2. Drop units outside the common support region (see Lopez and Gutman22 for possible regions) and refit the model once.
  3. For all t′ ≠ t:
    1. Partition all units using a clustering algorithm on logit(R^t,t(X))(logit(r^(w,X)), wt,t). This forms K strata of units with relatively similar Z − 2 components of the GPS vectors.
    2. Within each K stratum, match those receiving t to those receiving t′ on 1 of the different measures described in Scotina and Gutman,48 with replacement either with or without a caliper.
  4. Units receiving t that were matched to units receiving all treatments t′ ≠ t, along with their matches, make up the final matched cohort.

The original VM algorithm performed step 3b with nearest neighbor matching. This approach ensures that each unit will be paired with its best match, but it could leave units outside the final matched cohort, resulting in larger sampling errors.48 A possible extension is to match each unit with more than 1 unit. We have examined the performance of VM when each unit in the reference group is matched to 2 units in the other treatment groups (VM2). We summarize and explicate the configurations of each matching algorithm in Table 2.

Table 2. List of Matching Algorithms.

Table 2

List of Matching Algorithms.

Imputation Based on ABB

  1. The theoretical justification for matching estimators has been described only when the outcome is continuous and the estimand of interest is the difference in means.49-52 In cases where these restrictions are not fulfilled, there are currently no statistically valid estimates for causal effects with multiple treatments. We describe an ABB approach to impute the missing potential outcomes. The method results in multiple data sets in which all potential outcomes are observed for each unit. Analysis is conducted in each data set separately, and the final estimate is obtained through the common multiple-imputation rules.53 We describe an ABB procedure that is based on a missing data imputation procedure in clinical trials54: estimate R(Xi), i = 1, …, n using a multinomial logistic regression model.
  2. Drop units outside the common support region (see Lopez and Gutman22 for possible regions) and refit the model once.
  3. Use KMC to partition patients into K subclasses based on logit(R^(X))(logit(r^(X)) .
  4. Within each cluster k:
    1. Let OwK be the set of patients in subclass k who received treatment w, and let and nwk = |Owk| be the cardinality of OwK. For each treatment wt ∈ {t1, …, tZ}, draw nwk values from OwK with replacement. The result is the donor pool, O˜wK for each treatment group w.
    2. For each w, draw ntk values to impute the missing Yi (w).
  5. Repeat steps 4(a) and 4(b) M = 25 to generate M complete data sets.

This double resampling step ensures that the estimates are statistically valid.55

Simulation Design

We relied on extensive simulation analysis to compare most of the previously proposed methods and the different matching algorithms. All the simulations had a similar form, but different configurations were used when comparing the previous methods with VM algorithms and when comparing the different matching algorithms because the results showed that the VM algorithm is generally superior to the previously proposed methods.

Simulation configurations were based on factors that were either known to the investigator or can be estimated from the data without examining any outcome values. A P-dimensional X was generated for n = n1 + ⋯ + nZ units receiving 1 of Z ∈{3,5,10} treatments, W = {1, …,Z} For Z = 3, we generated sample sizes such that n2 = n1 and n3 = γn1. For Z = 5, we generated similar sample sizes for n1, n2, and n3 as for Z = 3, and we set n4 = n2 and n5 = n3. For Z = 10, the treatment group sample sizes for n1, …, n5 are the same as for Z = 5, and the size of each of the treatment groups 6-10 were ni + 5 = ni, i = 1, …,5. The values of X were generated from multivariate skew-t distributions such that

Xi|{Wi=w}~ Skewtdf,P(μw,Σw, η).

For Z ∈ {3, 5}, μw = vec(1Pbw), where 1P is a P X 1 vector of 1s and bw is the Z X 1 vector such that the w value is equal to b and the rest are zeros. In addition, the covariance matrices Σw, w ∈ {1, …,5} were equi-correlation matrices. Matrix Σ1 has a diagonal element of 1 and λ elsewhere. Matrices Σw, w ∈ {2,…,5} have respective diagonal entries of σ22, σ32,  σ22, and σ32 and off-diagonal entries of λ.

For Z = 10, bw is the 10 X 1 vector, such that the w value is equal to b and the rest are zeros, and Σw = IP, where IP is the P X P identity matrix. This was done to reduce the running time when dealing with a large number of treatments. The simulation design assumes a regular assignment mechanism23 that depends on the factors listed in Table 3. For each configuration, 100 replications were produced. We discarded configurations when P = 20 and n1 = 600 and for Z ∈ {3, 5, 10} because of the small number of units that can be matched across all treatment arms. After discarding these configurations, there were 5184 simulation configurations for Z ∈ {3, 5} and 576 simulation configurations for Z = 10.

Table 3. Simulation Factors.

Table 3

Simulation Factors.

To summarize the simulation results, we relied on 3 metrics. Two metrics measured the overall bias; the third examined the number of units in the reference group that we retained in the study. Let ψiw be the number of times unit i serves as a match to other units in treatment group w, and let nwm be the number of units from treatment group w in the matched sample, including units that are used as a match more than once. The weighted mean of covariate p = 1, …, P at treatment w, is defined as

X¯pw=1nwmi=1nXpiTiwψiw,

where Tiw is an indicator function that is equal to 1 when Wi = w and otherwise zero.

The standardized bias at each covariate p for pair of treatments j and k is defined as

SBpjk =X¯pjX¯pkδpt,

where δpt is the standard deviation of Xp in the full sample among units receiving the reference treatment W = t. McCaffrey et al37 and Lopez and Gutman22 estimated the maximum absolute standardized pairwise bias at each covariate:

Max2SBp = max (|SBp12|, |SBp13|, |SBp23|, ).

This metric reflects the largest discrepancy in estimated covariate means between any 2 treatments for covariate p. McCaffrey et al37 advocated a cutoff of 0.20 but maintained that larger or smaller cutoffs may be more appropriate for different studies.

Each simulation configuration was repeated 100 times, and Max2SBp was recorded. We summarized the results using MaxMax2SB max1,,p (Max2SBp)  and Max2SB ¯1Pp=1PMax2SBp. The results with MaxMax2SB and Max2SB ¯ generally had the same trends, so we present 1 or the other.

For matching algorithms with a caliper, another metric to measure matching performance is the proportion of units from the eligible population with W = t that were included in the final matched set, PropMatched. Studies with PropMatched = 1 and low Max2SBp for all p are optimal because most of the units in the population of interest are retained, and the covariate distributions are similar on average across treatment groups. By design, matching algorithms without a caliper have PropMatched = 1.

Data Set

The CPRD is a longitudinal database that includes more than 13 million people enrolled from more than 600 general practitioners in the United Kingdom. This data source includes basic patient demographics and registration details as well as medical history events that include symptoms, signs, and diagnoses. In addition, the CPRD has data on clinical tests and the details of all issued prescriptions. Diagnostic information in the CPRD is coded using Read codes, the standard clinical terminology system used in the United Kingdom. A subset of patients was also linked to the Hospital Episode Statistics inpatient data, which include information about British National Health Service inpatient visits. Primary and additional causes of admission are coded using International Statistical Classification of Diseases, Tenth Revision (ICD-10) codes.

Exclusion and Inclusion Criteria

Using the CPRD database, we implemented an observational cohort study. Because we are not able to analyze the entire CPRD database directly, we initially obtained only CPRD patients who fulfilled the conditions described in Appendix 1. For the initial cohort of 109 616 participants received from the CPRD, we applied the following restrictions: (1) received their first-ever prescription for a treatment of interest in their therapy file between January 2007 and December 2012; (2) had a diagnosis of T2DM before the date in (1); (3) had no prescription for a treatment in the same class as the initiating drug during the 6-month baseline period; and (4) had at least 1 prescription for a noninsulin antihyperglycemic drug. We restricted the study population to patients who had received continuous metformin monotherapy for at least 60 days and who started a second glucose-lowering agent concurrent with metformin and continued this for at least 60 days. The index date was defined as 60 days after the date of the first filled prescription for the second (ie, nonmetformin) agent, and patients were grouped based on their second-line AHT regimen class: sulfonylurea (SU), dipeptidyl peptidase-4 (DPP-4) inhibitor, or thiazolidinedione (TZD). Additional inclusion criteria included being 18 to 85 years of age, having a body mass index between 18.5 and 50, and having serum creatinine between 20 and 250 µmol/L (Figure 1).

Figure 1. Flowchart of Cohort Selection.

Figure 1

Flowchart of Cohort Selection.

Outcomes

We examined the cardiovascular safety of 3 AHT classes as add-on medication to metformin in treating T2DM. The 2 outcomes of interest were defined as MACE and all-cause mortality (ACM) within 3 years of initiation. In Appendix 2, we provide the ICD-10 codes used to identify MACE. These codes were previously used by Ekström et al.56

Covariates

The covariates that we included in the GPS estimation can be classified into 4 major types. The first type included demographics (eg, age, sex, socioeconomic status). The second type included clinical factors (eg, smoking status, year of the index date, diabetes duration, time on metformin before add-on initiation, hemoglobin A1c [HbA1c]). The third type was comorbidities (eg, stroke, atrial fibrillation, cancer, arrhythmia, chronic obstructive pulmonary disease). The fourth type was concurrent treatments (eg, angiotensin-converting enzyme inhibitors, angiotensin subtype II receptor blockers, calcium channel blockers, statins). For a complete list see Table 4.

Table 4. Baseline Patient Characteristics for Each Cohort Within the Region of Common Support, Defined by Second-Line Treatment Allocation.

Table 4

Baseline Patient Characteristics for Each Cohort Within the Region of Common Support, Defined by Second-Line Treatment Allocation.

Results

Simulation Results

We begin by comparing simple VM, weighting, clustering algorithm, and CRPM. Max2SB¯ exceeds a cutoff of 0.20 in 57% of combinations when using KMC, 25% when using IPW, 19% when using CRPM, and 4% when using VM. In 16 simulation configurations, IPW yields Max2SB¯ >1.5.22 Simple VM had PropMatched > 85% in 99% of the configurations, whereas only 37% of the configurations for CRPM reached the 85% cutoff.22 Investigating the main factors that influence the performance of the various methods showed that generally, when the distribution of the covariates is normal, all methods perform reasonably well. However, when the tails are heavier (eg, t-distribution), IPW will generally not perform as well as VM or CRPM.22

Because these results show that VM algorithms perform better in terms of covariate balance, we further compared only the different matching methods in Table 2. With 3 treatments, the best-performing method in terms of MaxMax2SB is VM with no replacement (VMnr), with only 12% of the configurations above 0.2. Matching on the Mahalanobis distance of the GPS and frequency matching (FM) follows, with 17% and 18% of the configurations above 0.2, respectively. Covariant matching with no caliper (COVnc) has the worst performance, with MaxMax2SB exceeding 0.2 in 29% of the configurations. However, among these methods, VMnr has the lowest median PropMatched, with only 64% of the reference group units matched on average. Thus, although VMnr generally yields the lowest bias in the matched cohort, it is at the expense of generalizability because the matched cohort is less representative of the original sample (eg, lower PropMatched).

When comparing the different procedures with Z = 5, VM exceeds 0.2 for the majority of the configurations. Algorithms without a caliper (kernel matching with no caliper [KMnc], FM with no caliper [FMnc], GPS with no caliper [GPSnc], COVnc) perform better with 5 treatments, with the majority of the configurations yielding MaxMax2SB below 0.20. Kernel matching and FM also perform favorably, with 55% and 57% of configurations yielding a MaxMax2SB below 0.20, respectively. The best matching algorithms with a caliper are FM, which identifies matches for >75% of the reference group units in 68% of configurations, and VM, with replacement that identifies matches for >75% of the units in the reference group in 98% of the configurations. For additional details, see Scotina and Gutman.48

We have examined only those methods without a caliper for Z = 10 because we have seen that they perform better than those with a caliper for Z = 5. The median MaxMax2SB for VM is larger than the 0.20 cutoff, and only 17% of configurations yield MaxMax2SB lower than 0.20. The median MaxMax2SB for GPS is above 0.20, and it is trending upward compared with Z = 3 and Z = 5. For FMnc, 64% of configurations yield MaxMax2SB below 0.20, with an interquartile range of 0.16 to 0.23. For additional details, see Scotina and Gutman.48

Data Analysis Results

Of the 21 976 patients taking metformin included in the study, 13 816 (63%) had initiated dual therapy with an SU, 5860 (27%) with a DPP-4 inhibitor, and 2300 (10%) with a TZD. Because the majority of patients initiated dual therapy with gliclazide (90% of SU users), sitagliptin (72% of DPP-4 inhibitor users), or pioglitazone (83% of TZD users), we further restricted the study population to patients who initiated dual therapy with 1 of these 3 agents. The median patient age was between 61 and 62 years for each group. Diabetes duration was <5 years for 63% of all patients and between 5 and 10 years for 20% of all patients. The median time receiving metformin monotherapy was approximately 2 years for all groups. HbA1c levels were 8.3 to 8.4 for each intervention group, reflecting similar degrees of blood glucose control.

We set metformin plus SU as the reference level because SU is the oldest, most commonly used drug for T2DM. After trimming units that did not have similar units in all the other treatment arms, we had 10 487 patients in the SU (gliclazide) group, 3180 patients in the DPP-4 inhibitor (sitagliptin) group, and 1817 patients in the TZD (pioglitazone) group. Using ABB with 7 subclasses, we estimated that the MACE rates for each drug among SU users were 0.13 (95% CI, 0.12-0.13) for SUs, 0.12 (95% CI, 0.11-0.14) for DPP-4 inhibitors, and 0.09 (95% CI, 0.07-0.11) for TZDs. Using ABB with 7 subclasses, we estimated that the ACM rates for each drug among SU users were 0.039 (95% CI, 0.035-0.042) for SUs, 0.027 (95% CI, 0.018-0.036) for DPP-4 inhibitors, and 0.035 (95% CI, 0.026-0.045) for TZDs. Table 5 provides the difference in proportions and the standard errors using ABB with 1, 3, 5, and 7 subclasses; IPW; and nearest-neighbor matching on the Mahalanobis distance of the logit GPSnc.

Table 5. Estimated 3-Year Risk Difference for MACE and ACM Among SU Second-Line Users.

Table 5

Estimated 3-Year Risk Difference for MACE and ACM Among SU Second-Line Users.

We observed significantly higher proportions of MACE for SU compared with TZD and higher proportions of MACE for DPP-4 inhibitors compared with TZDs. Significantly higher 3-year ACM was observed for SU compared with DPP-4 inhibitors.

Discussion

Context

Many PCOR/comparative effectiveness research studies have a goal of selecting 1 treatment from 3 or more possible interventions. Simultaneous assessment of such multiple interventions is attractive because it facilitates identification of the best intervention without the need to perform many studies in which each pair of interventions is compared. However, even in a randomized controlled environment, multi-arm trials can be considerably more complex to design, conduct, and analyze than 2-arm, single-question trials.57 These complications include large sample size requirements, assurance of eligibility of all participants for all the interventions, challenges in defining the specific comparisons that will be made, and the summaries of those comparisons. These problems are exacerbated in nonrandomized settings.

Summary of Findings

We have evaluated several matching methods for estimating causal effects with multiple treatments in observational studies. Of these methods, matching on the Mahalanobis distance of the logit of the GPS is shown to perform better than other methods for small numbers of treatments. However, as the number of treatments increases, adding an initial clustering step using KMC or fuzzy clustering results in better matches in terms of initial covariate bias.

One possible issue with matching estimators is that their theoretical properties have been derived only for continuous outcomes. We have developed a method that views causal inference as a missing data problem and uses ABB to impute the missing potential outcomes. This method creates multiple data sets in which all potential outcomes are observed. Estimates and their standard errors are calculated within each data set separately, and final point and interval estimates are obtained using standard multiple-imputation rules. These methods are statistically valid.

We found that matching methods that rely on binary PSs estimated only on patients receiving 1 of 2 treatments may result in significant bias in the covariates' distributions between patients receiving the different interventions. These methods may lead to biased and nontransitive estimates and therefore should not be applied generally.

IPW methods reduce the initial bias in covariates significantly; however, in our simulations, we showed they may suffer from extreme weights that yield erratic causal estimates. This problem is exacerbated as the number of interventions increases and is especially prevalent with covariates that are not normally distributed. Simple trimming of units with GPS components that are close to 0 or 1 may result in increased bias because units that are similar on a single GPS component may differ on others. Other approaches for estimating the GPS, such as generalized boosted models, may solve this issue. However, more research is needed to derive sampling variance estimates for these procedures and to examine their behavior in a range of applications. Finally, IPW estimates are mainly suitable for estimating differences in averages and are not well suited for comparison of other estimands.

The effects of each noninsulin agent plus metformin on MACE remain uncertain, and little useful information about cardiovascular outcomes exists. Because randomized experiments comparing all noninsulin agents are often infeasible because of financial considerations, we used an observational study to estimate the effects of adding 3 antihyperglycemic medications to metformin in patients with T2DM. Comparing these second-line therapies for T2DM, we found that TZDs result in significantly lower MACE rates than SUs and DPP-4 inhibitors. In addition, DPP-4 inhibitors have significantly lower MACE rates than SUs. Finally, DPP-4 inhibitors have significantly lower ACM compared with SUs; TZDs show lower point estimates of ACM, but those estimates are not statistically significant.

Study Limitations

Because the study has both a methodologic component and a data analysis component, we address the limitations of each component separately. We have provided theoretical justifications that matching methods that rely on binary PSs (eg, SBC and CRPM) may result in significant bias in the covariates' distributions. Comparison of the other methods aimed at estimating the effects of more than 2 interventions was based on extensive simulations that do not represent the entire set of possible data sets. Thus, our results represent general trends and suggestions for selecting the most appropriate method for estimating the effects of more than 2 treatments in observational studies. In any specific data set, 1 method that performed poorly (relative to others) in 1 data set may perform better than these methods in another data set. For example, in some data sets, weighting methods may outperform some of the matching methods. However, as the number of interventions increases, we find this tendency to be less plausible because of the increasing number of units with large weights for a specific intervention.

In simulations, we show that using the Mahalanobis distance on the entire set of covariates does not perform well when the number of covariates increases compared with methods that rely on the GPS. However, as was shown for the binary PS, balancing on the GPS promises only that the covariates' means are balanced. Thus, it is important to include in the GPS estimation higher-order terms and interactions to balance other moments of the covariates' distribution.

Another limitation of our method is the reliance on the unconfounded assignment mechanism assumption. When this assumption is violated, one would expect bias in causal estimates. This assumption is untestable with observed data and requires sensitivity analyses, which are outside the scope of this report.

When comparing the second-line therapies for T2DM, we have compared just 3 possible medications (ie, gliclazide, sitagliptin, and pioglitazone). One reason is that some of the more recent therapies were introduced only in the last 2 years of our study. Thus, we were unable to identify enough patients taking some of the possible therapies to provide accurate estimate of the treatment effects. Another possible limitation of our analysis is that we required patients to use the AHT second-line therapy for at least 60 days. This restriction could introduce immortal time bias because patients had to survive for at least 60 days to be included. Moreover, some patients might have experienced MACE that led them to discontinue the medication and thus to be excluded from the analysis. However, when examining the effects for patients who had used the second-line therapy for at least 30 days and had no restrictions on the number of days used, the results were practically similar (Appendix 2, Table 1). Finally, the administration of a drug over years may be different; this would violate the SUTVA assumption because it implies that there is more than 1 version of the treatment. One possibility is to compare the effects of the 3 drugs within each year separately, but doing so will result in a significant reduction in sample size.

Future Research

There are multiple directions for future research. First, multiple-imputation methods for causal inference have been shown to result in good operating characteristics.58 In this report, we described 1 such method that depends solely on the PS. New methods that rely on PS modeling as well as other covariates may result in better operating characteristics. Second, because PS balances only covariates on average, new methods that rely on optimization techniques (eg, Zubizarreta59) are also an interesting area of research. Third, an important area for future research is the development of transparent and coherent methods to examine the sensitivity of study results to the unconfounded assignment mechanism assumption.

In terms of adverse effects for second-line therapies for T2DM, examining the effects of more recent therapies such as glucagon-like peptide-1 receptor agonists and sodium-glucose cotransporter-2 inhibitors is an important research question.

Conclusions

With the proliferation of treatments for different medical conditions, finding improved methods for comparing multiple alternative treatments is an important goal. This goal applies both to health care provider decision-making and to supporting patient involvement in deciding on treatment choices. Randomized clinical trials are the gold standard for comparing multiple treatments; however, multi-arm randomized trials are often impractical for financial, logistical, and ethical reasons. In this report, we describe the required assumptions for estimating causal effects with multiple treatments in observational studies and identify the possible pitfalls that may occur when implementing currently available methods. Based on the assumption that the assignment mechanism is unconfounded, we have also provided statistically valid matching and imputation methods that applied researchers can use to compare multiple alternative treatments in nonrandomized observational studies.

References

1.
Patient-Centered Outcomes Research Institute (PCORI) Methodology Committee. PCORI Methodology Report. Published January 2019. Accessed September 3, 2020. https://www​.pcori.org​/sites/default/files​/PCORI-Methodology-Report.pdf
2.
UK Prospective Diabetes Study (UKPDS) Group. Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34). Lancet. 1998;352(9131):854-865. [PubMed: 9742977]
3.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41-55.
4.
Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79:516-524.
5.
Rubin DB, Thomas N. Characterizing the effect of matching using linear propensity score methods with normal distributions. Biometrika. 1992;79(4):797-809.
6.
Rubin DB, Stuart EA. Affinely invariant matching methods with discriminant mixtures of proportional ellipsoidally symmetric distributions. Ann Stat. 2006;34(4):1814-1826.
7.
Hirano K, Imbens GW. Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization Health Serv Outcomes Res Methodol. 2001;2:259-278.
8.
Dehejia RH, Wahba S. Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat. 2002;84:151-161.
9.
Monahan KC, Lee JM, Steinberg L. Revisiting the impact of part-time work on adolescent adjustment: distinguishing between selection and socialization using propensity score matching. Child Dev. 2010;82:96-112. [PubMed: 21291431]
10.
Cooper WO, Habel LA, Sox CM, et al. ADHD drugs and serious cardiovascular events in children and young adults. New Eng J Med. 2011;65(20):1896-1904. [PMC free article: PMC4943074] [PubMed: 22043968]
11.
Segal JB, Griswold M, Achy-Brou A, et al. Using propensity scores subclassification to estimate effects of longitudinal treatments: an example using a new diabetes medication. Med Care. 2007;45:149-157. [PubMed: 17909374]
12.
Stuart EA. Matching methods for causal inference: a review and a look forward. Stat Sci. 2010;25(1):1. [PMC free article: PMC2943670] [PubMed: 20871802]
13.
Nielsen RA, Findley MG, Davis ZS, Candland T, Nielson DL. Foreign aid shocks as a cause of violent armed conflict. Am J Polit Sci. 2011;55(2):219-232.
14.
Iacus SM, King G, Porro G. Multivariate matching methods that are monotonic imbalance bounding. J Am Stat Assoc. 2011;106(493):345-361.
15.
Boyd CL, Epstein L, Martin AD. Untangling the causal effects of sex on judging. Am J Polit Sci. 2010;54(2):389-411.
16.
Kam CD, Palmer CL. Reconsidering the effects of education on political participation. J Polit. 2008;70(3):612-631.
17.
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2005;25(1):127-141. [PubMed: 16217841]
18.
Neyman J. Sur les applications de la thar des probabilities aux experiences agaricales: essay de principle. English translation of excerpts by Dubrowska and Speed. (1990). Stat Sci. 1923;5:465-472.
19.
Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688-701.
20.
Holland PW. Statistics and causal inference (with discussion). J Am Stat Assoc. 1986;81:945-970.
21.
Rubin DB. Formal modes of statistical-inference for causal effects. J Stat Plan Infer. 1990;25(3):279-292.
22.
Lopez MJ, Gutman R. Estimation of causal effects with multiple treatments: a review and new ideas. Stat Sci. 2017;32(3):432-454.
23.
Imbens G, Rubin DB. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press; 2015.
24.
Imbens G. The role of the propensity score in estimating dose-response functions. Biometrika. 2000;87(3):706-710.
25.
Imai K, Van DA. Causal inference with general treatment regimes. J Am Stat Assoc. 2004;99(467):854-866.
26.
Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 1955;50(272):1096-1121.
27.
Joffe MM, Rosenbaum PR. Invited commentary: propensity scores. Am J Epidemiol. 1999;150(4):327-333. [PubMed: 10453808]
28.
Zanutto E, Lu B, Hornik R. Using propensity score subclassification for multiple treatment doses to evaluate a national antidrug media campaign. J Educ Behav Stat. 2005;30(1):59-73.
29.
Dorsett R. The new deal for young people: effect on the labour market status of young men. Labour Econ. 2006;13(3):405-422.
30.
Drichoutis AC, Lazaridis P, Nayga RM. Nutrition knowledge and consumer use of nutritional food labels. Eur Rev Agric Econ. 2005;32(1):93-118.
31.
Kosteas VD. The effect of exercise on earnings: evidence from the NLSY. J Labor Res. 2012:1-26.
32.
Lechner M. Program heterogeneity and propensity score matching: An application to the evaluation of active labor market policies. Rev Econ Stat. 2002;84(2):205-220.
33.
Levin L, Alvarez RM. Measuring the effects of voter confidence on political participation: an application to the 2006 Mexican election. Caltech/MIT Voting Technology Project, VTP Working Paper. 2009;75.
34.
Rassen JA, Solomon DH, Glynn RJ, Schneeweiss S. Simultaneously assessing intended and unintended treatment effects of multiple treatment options: a pragmatic “matrix design.” Pharmacoepidemiol Drug Saf. 2011;20(7):675-683. [PubMed: 21626604]
35.
Rassen JA, Shelat AA, Franklin JM, Glynn RJ, Solomon DH, Schneeweiss S. Matching by propensity score in cohort studies with three treatment groups. Epidemiology. 2013;24(3):401-409. [PubMed: 23532053]
36.
Tu C, Jiao S, Koh WY. Comparison of clustering algorithms on generalized propensity score in observational studies: a simulation study. J Stat Comput Simul. 2013;83(12): 2206-2218.
37.
McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF. A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med. 2013;32(19):3388-3414. [PMC free article: PMC3710547] [PubMed: 23508673]
38.
Yoshida K, Hernandez-Diaz S, Solomon DH, et al. Matching weights to simultaneously compare three treatment groups: comparison to three-way matching. Epidemiology. 2017;28(3):387-395. [PMC free article: PMC5378668] [PubMed: 28151746]
39.
Li F. Propensity score weighting for causal inference with multi-valued treatments. 2018. arXiv preprint. https://arxiv​.org/abs/1808.05339
40.
Canhão H, Rodrigues AM, Mouro AF, et al. Comparative effectiveness and predictors of response to tumour necrosis factor inhibitor therapies in rheumatoid arthritis. Rheumatology. 2012;51(11):2020-2026. [PMC free article: PMC3475979] [PubMed: 22843791]
41.
Moore A. Efficient Memory-Based Learning for Robot Control. University of Cambridge; 1991.
42.
Hott JR, Brunelle N, Myers JA, Rassen J, Shelat A. KD-Tree Algorithm for Propensity Score Matching With Three or More Treatment Groups. Division of Pharmacoepidemiology and Pharmacoeconomics, University of Virginia; 2012.
43.
Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23(19):2937-2960. [PubMed: 15351954]
44.
Kang JD, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007;22(4):523-539. [PMC free article: PMC2397555] [PubMed: 18516239]
45.
Little RJ. Missing-data adjustments in large surveys. J Bus Econ Stat. 1988;6(3):287-296.
46.
Lopez MJ, Gutman R. Estimating the average treatment effects of nutritional label use using subclassification with regression adjustment. Stat Methods Med Res. 2017;26(2):839-864. [PMC free article: PMC6247807] [PubMed: 25432690]
47.
Rubin DB, Schenker N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc. 1986;81(394):366-374.
48.
Scotina AD, Gutman R. Matching algorithms for causal inference with multiple treatments. Stat Med. 2019;38(17):3139-3167. [PubMed: 31066079]
49.
Abadie A, Imbens GW. Bias-corrected matching estimators for average treatment effects. J Bus Econ Stat. 2011;29(1):1-11.
50.
Abadie A, Imbens GW. Large sample properties of matching estimators for average treatment effects. Econometrica. 2006;74(1):235-267.
51.
Scotina AD, Beaudoin FL, Gutman R. Matching estimators for causal effects of multiple treatments. Stat Methods Med Res. 2019:0962280219850858. [PubMed: 31138025]
52.
Yang S, Imbens GW, Cui Z, Faries DE, Kadziola Z. Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics. 2016;72(4):1055-1065. [PubMed: 26991040]
53.
Rubin DB. Multiple Imputation for Nonresponse in Surveys. Wiley-Interscience; 2004.
54.
Lavori PW, Dawson R, Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Stat Med 1995;14(17):1913-1925. [PubMed: 8532984]
55.
Carpenter JR, Kenward MG. Multiple Imputation and Its Application. John Wiley & Sons; 2013.
56.
Ekström N, Svensson AM, Miftaraj M, et al. Cardiovascular safety of glucose-lowering agents as add-on medication to metformin treatment in type 2 diabetes: report from the Swedish National Diabetes Register. Diab Obes Metab. 2016;18(10):990-998. [PubMed: 27282621]
57.
Vermorken JB, Parmar MKB, Brady MF, et al. Clinical trials in ovarian carcinoma: study methodology. Ann Oncol. 2005;16:20-29. [PubMed: 16239233]
58.
Gutman R, Rubin DB. Estimation of causal effects of binary treatments in unconfounded studies. Stat Med. 2015;34(26):3381-3398. [PMC free article: PMC4782596] [PubMed: 26013308]
59.
Zubizarreta JR. Using mixed integer programming for matching in an observational study of kidney failure after surgery. J Am Stat Assoc. 2012;107(500):1360-1371.

Related Publications

  1. Lopez MJ, Gutman R. Matching to estimate the causal effects from multiple treatments. Stat Sci. 2017;32(3):432-454.
  2. Scotina AD, Gutman R. Matching algorithms for causal inference with multiple treatments. Stat Med. 2019;38:3139-3167. [PubMed: 31066079]
  3. Scotina AD, Beaudoin FL, Gutman R. Matching estimators for causal effects of multiple treatments. Stat Methods Med Res. 2020;29(4):1051-1066. [PubMed: 31138025]
  4. Scotina AD, Zullo AR, Smith RJ, Gutman R. Approximate Bayesian bootstrap procedures to estimate multilevel treatment effects in observational studies with application to type 2 diabetes treatment regimens. Preprint. Stat Methods Med Res. Posted online June 26, 2020. https://journals​.sagepub​.com/doi/abs/10.1177​/0962280220928109?journalCode=smma [PubMed: 32588747]

Acknowledgment

Research reported in this report was funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (#ME-1403-12104). Further information available at: https://www.pcori.org/research-results/2014/creating-and-testing-methods-estimate-treatment-effect-observational-studies

Institution Receiving the PCORI Award: Brown University
Original Project Title: Estimation of Multi-Treatment Effects from Observational Data with Application to Diabetes Mellitus
PCORI ID: ME-1403-12104

Suggested citation:

Gutman R, Scotina A, Smith RJ, Dore DD, Zullo AR. (2020). Creating and Testing Methods to Estimate Treatment Effect in Observational Studies with Three or More Treatments. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/05.2020.ME.140312104

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

Copyright © 2020. Brown University. All Rights Reserved.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK593642PMID: 37556582DOI: 10.25302/05.2020.ME.140312104

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.1M)

Other titles in this collection

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...