U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of Developing Software to Predict Patient Responses to Knee Osteoarthritis Treatments and to Identify Patients for Possible Enrollment in Randomized Controlled Trials

Developing Software to Predict Patient Responses to Knee Osteoarthritis Treatments and to Identify Patients for Possible Enrollment in Randomized Controlled Trials

, MD, MSPH, , RN, MPH, , MPH, , BSEE, BSCS, , BA, , MS, PhD, , MD, , MD, CM, MS, , PhD, , BS, MPH, and , MD, MPH.

Author Information and Affiliations

Structured Abstract

Background:

Although they represent a standard of evidence, randomized controlled trials (RCTs) often fall short because of insufficient or unrepresentative enrollment, and many needed trials are never conducted. This leaves gaps in evidence to inform patient care decisions and creates a need for a method to facilitate RCTs in usual care settings.

As medical therapies become increasingly less satisfactory for patients with osteoarthritis, an average of 680 886 patients receive surgical knee replacement per year in the United States. Yet, there have been no substantial comparative effectiveness RCTs of medical vs surgical total knee replacement (TKR). The question about TKR for knee osteoarthritis is suitable for exploring a method that would facilitate the conduct of comparative effectiveness RCTs by assisting discernment of patient-specific equipoise between treatments.

Clinical equipoise is a prerequisite for enrollment into an RCT; likewise, mathematical equipoise is the use of mathematical models to predict and compare patient-specific outcomes of alternative treatment options that should be considered when enrolling patients into an RCT. When the predictions are similar, suggesting equipoise, then random treatment assignment may be justified, and the patient may feel more comfortable enrolling in the RCT. When the predictions suggest one treatment is better than another, trial enrollment may be inappropriate, but the predictions still can inform clinical decision-making.

Objectives:

This project aimed to use mathematical equipoise for making patient-specific comparisons of alternative treatment outcomes of TKR vs nonsurgical treatment of knee osteoarthritis as a way to consider enrollment into a comparative effectiveness RCT.

Methods:

We first obtained the views of patient stakeholders with knee osteoarthritis to identify key pain and physical function outcomes. After creating a consolidated database from non-RCT sources of knee osteoarthritis outcomes, and adjusting for the inherent differences between the databases, we developed multivariable mathematical models that predict patient-specific pain and physical function outcomes for TKR or nonsurgical treatment. We then developed the Knee Osteoarthritis Mathematical Equipoise Tool (KOMET) user interface based on these models to discern patient-specific equipoise. We pilot tested the interface to assess usability and responsiveness to the needs of patients and physicians and its adequacy for supporting shared decision-making, both for RCT enrollment and for treatment.

Results:

We incorporated KOMET regression models into prototype KOMET decision support software, which we successfully pilot tested in a range of clinics. Patients found it very helpful in making treatment decisions, but only 7 of the 12 understood the concept of equipoise.

Conclusions:

This project demonstrated the use of mathematical equipoise as a method for providing patient-specific decision support for shared patient-physician decision-making for selecting between alternative treatments and considering enrollment into a comparative effectiveness RCT.

Limitations and Subpopulation Considerations:

Although largely accomplishing its intended objectives, as an early stage in the development of mathematical equipoise decision support, this project has limitations related to the available clinical data, the modeling methods and variables, and the prototype software. The next step will be to conduct a larger-scale test, and then to implement it for its intended use—the conduct of a comparative effectiveness trial in usual care settings.

Background

Symptomatic knee osteoarthritis has an estimated prevalence of 17% to 34% in US adults1 and is the most frequent cause of dependency in lower-limb tasks, especially in elderly patients.2 It has considerable economic and societal costs, including 68 million work-loss days per year, and is the cause for >5% of the annual retirement rate and for hundreds of thousands of hospital admissions.3-6 For many patients, as osteoarthritis progresses, medical and physical therapy become less satisfactory, making this the most frequent reason for joint replacement surgery.4

There are concerted efforts to develop drugs that retard the progression of osteoarthritis, many through preserving cartilage. Ultimately, effective intervention will require addressing the multistructure failure inherent to osteoarthritis, which includes periarticular bone as well as soft-tissue structures within the joint. Meanwhile, total knee replacement (TKR) has become the ultimate standard for treatment, now completed for an average of 680 886 patients per year in the United States, with aggregate charges >$36 billion.7

Shared patient-clinician decision-making is particularly germane to deciding between medical treatments and surgical knee replacement. Not only do patient preferences have great relevance, but the availability of treatments, their inconvenience and expense, and the accumulation of comorbidities over time are all salient.7 Compromising these decisions are gaps in patient-specific information about alternatives and their effects in different populations.8 At the time we initiated this project, we found decision aids but no explicit predictive models in the literature or published randomized controlled trials (RCTs) of medical vs surgical treatment of knee osteoarthritis. At the time of this writing, a Danish study of 100 patients with knee osteoarthritis who were eligible for unilateral total knee replacement was the only such known trial to show that TKR followed by nonsurgical treatment resulted in greater pain relief and functional improvement after 12 months vs nonsurgical treatment alone.9 However, TKR was associated with more serious adverse events than nonsurgical treatment, and most patients who were randomly assigned to nonsurgical treatment alone did require TKR within the study's 12-month follow-up.9 Thus, the question is far from settled at this point.

The description and measurement of clinical change in knee osteoarthritis is not necessarily reliable, undermining comparisons of alternative treatments.10 Moreover, the cross-sectional US national DECISIONS survey found that more than half of patients discussing knee or hip surgery underestimated the harm from surgery, and only 28% correctly estimated the amount of pain relief following surgery.11

As clinical decision support, we previously created and tested predictive instruments based on multivariable logistic regression models that provide 0% to 100% predictions of medical diagnoses and outcomes of treatments.12-15 They have been used successfully for short-term decisions such as whether to hospitalize a patient and/or to treat for acute myocardial infarction. These emergency decisions are dominated more by physician judgment than are decisions about longer-term and more complex treatments. Decision support for more complex decisions—for which shared patient-clinician decision-making is central—has been well studied. A 2014 Cochrane systematic review of 115 RCTs found that decision aids increased patient knowledge, improved accuracy of risk perceptions when expressed in probabilities, enhanced concordance with patient values when including a values clarification exercise, and reduced decisional conflict due to feeling uninformed and unclear about personal values.16 Similar decision aid benefits have been seen for patients with osteoarthritis considering hip or knee arthroplasty.17-21

Accordingly, the objective of this project was to create the Knee Osteoarthritis Mathematical Equipoise Tool (KOMET), intended to be embedded in electronic health records (EHRs) as decision support for shared clinical decision-making about patients' choices of treatment, especially between medical treatment and TKR. Additionally, this shared decision-making is intended to identify patients for whom, based on their specific characteristics, there is insufficient evidence to favor 1 of 2 or more treatment alternatives. This situation is referred to as clinical equipoise, the ethical and scientific basis for enrolling patients in an RCT. Shared patient-clinician decision-making is important in this circumstance, when patients' personal preferences and objectives can dominate what otherwise might appear to be a toss-up treatment decision.22 By illustrating the generation and use of patient-specific equipoise, KOMET also is intended to support shared decision-making about participation in RCTs, as an example implementation of mathematical equipoise, for practical, ethical, targeted enrollment into comparative effectiveness RCTs. If successful, presumably this approach could be used in many other conditions and clinical decisions.

In developing our cardiac predictive instruments, we were fortunate to have extensive patient-level data from RCTs. A great advantage of such data is that random assignment of treatments helps avoid having treatment effects biased by the selection of treatments and their use among patients. RCT data allow the multivariable regressions to accurately reflect the effect of a treatment when used in comparable patients; however, RCTs are expensive and time-consuming, and there are many conditions and treatments for which RCT-generated data are not available. Moreover, for the circumstances in which we might want to run a new RCT—for which we would potentially use mathematical equipoise for participant selection—there often will be few or no RCTs. In this case, to create predictive models, we must use data from observational studies, registries, EHR-based data warehouses, patient-acquired data feeds, and other sources. Registries of various patient groups and populations are relatively inexpensive and common, and EHRs generate increasingly more data available in databases and data warehouses. If these non-RCT sources could be used for creating predictive models, there would be vast opportunities for the mathematical equipoise approach to facilitate the conduct of clinical effectiveness RCTs, but there are protean challenges and limitations to this.

Clinical equipoise—the ethical and scientific basis for randomly assigning patients different treatments—is considered no longer present after a pivotal clinical trial shows one treatment is better than alternatives. All patients then must be offered the most-effective known therapy. Typically, however, this is not an individual patient-centered determination; only group-based general inclusion and exclusion criteria are available. Mathematical equipoise is intended as a method by which, for a given condition, only those individuals for whom there still is uncertainty could be enrolled in a comparative effectiveness trial, while individuals for whom the question is settled would not be enrolled.22 The objective is to generate RCT evidence based on individualization of treatments consistent with the principle of equipoise. This ultimately could allow treatment that accounts for the heterogeneity of treatment effects among different individuals and groups.

If embedded in EHRs and computerized physician ordering systems, potentially, determination of mathematical equipoise could serve as a practical way in routine clinical care to detect all eligible patients for possible RCT enrollment. It also could identify those patients not suitable for enrollment, for whom it could enhance clinical care by indicating the potentially best treatment. Also, the basis for selection for a clinical study could be transparent to patients and clinicians in real time to enhance truly informed consent during clinical care.

In this project we sought to create KOMET as an example of mathematical equipoise. To represent the prevailing circumstances in which this approach would be used, we used patient-level data from existing non-RCT sources to build predictive models of treatment outcomes; these models determined the presence or absence of mathematical equipoise to inform decision-making. We sought to illuminate limitations of available data and to explore strategies for overcoming such limitations to optimize modeling. Success in using non-RCT data in this way would support the goal of widespread use of the mathematical equipoise method.

We also sought to demonstrate through this project the utility of incorporating stakeholder input to ensure relevance of the ultimate predictive models to patient-physician decision-making. Although research into the engagement of stakeholders in research is still evolving in its terminology and frameworks,23,24 the criterion we used for this project—intended for comparative effectiveness research (CER)—was “individuals, organizations, or communities that have a direct interest in the process and outcomes of a project, research, or policy endeavor.”

Patient and Stakeholder Participation

We engaged stakeholders throughout the entire project to ensure the relevance of the ultimate models and the decision support to patient-physician decision-making. Patient, researcher, and clinician stakeholders were involved in the selection of study questions, choice of study outcomes, selection of candidate variables for the modeling database and the predictive model, and development and testing of the user interface. To foster this, we used PCORI's 6 engagement principles: reciprocal relationships, co-learning, partnerships, transparency, honesty, and trust, all of which allow for effective engagement in research.25

We held quarterly in-person meetings to build reciprocal relationships among stakeholders and the research team, to educate stakeholders about the research methods being used, and to solicit patient, researcher, and clinician stakeholder input. Participating groups included (1) patients with or at risk of having knee osteoarthritis, (2) patient advocates for those with arthritis, clinicians who cared for these patients, and (4) knee osteoarthritis researchers.26

We identified interested patient and advocate stakeholders through discussions with clinicians, knee osteoarthritis researchers, and the Arthritis Foundation. The patient panel included 3 women and 4 men representing people at risk for knee osteoarthritis due to existing osteoarthritis in other joints, people actively considering treatment options for their existing knee osteoarthritis, and patients who had received TKR for osteoarthritis. We recruited clinician stakeholders from primary care, orthopedics, and rheumatology. The clinician panel included 2 rheumatologists, 2 primary care physicians, 2 orthopedic surgeons, and 1 physical therapist, some of whom had a dual role representing researchers.

Selection of Outcomes

We chose the 2 outcome scales on which we built our models, the Western Ontario and McMaster Universities Arthritis Index (WOMAC) and 12-item Short Form Survey (SF-12) physical component scores, after discussions with clinician and patient stakeholders. Factors considered included the time frame of the outcome beyond surgery and the meaningfulness to someone making a decision about surgery, taking into account constraints imposed by our available data sources. Stakeholders were strongly supportive of using both the pain and functional outcome scores, as both were part of patients' decision-making processes.

Modeling Database Creation

We created a modeling database from 4 data sets, matching patients who had surgical treatment with ones who had nonsurgical treatment. To guide our choice of the variables on which they would be matched, we gathered input from clinicians on the research team, clinician and patient stakeholders, and results from prior published literature. Variable choices for these models were informed by the needs of stakeholders who would use decision support for knee osteoarthritis, focusing on their views about the representation of pain and functional outcomes.

Predictive Model Development and Results

We provided all stakeholders with an orientation to the modeling process to foster their ability to provide input on selection of candidate variables for model development. Interaction terms in the statistical models allow differences in predicted benefit for different patients, so receiving input on plausible interactions was important. The candidate primary and interaction variables included in the model selection process were those stakeholders considered important, plausible, and easily and reliably provided. We considered outcome variables based on stakeholder ranking of how much the variable would be related to pain and functional outcomes 1 year in the future.

We sought clinician and patient stakeholder input on the clinical significance of the results of predictive modeling. As the project evolved, the research team and stakeholders concluded that many of the variables under consideration were too burdensome to collect or too difficult to ascertain. To accommodate this, we adjusted models that did not have significant impact on performance characteristics.

User Interface Development and Testing

Both clinician and patient stakeholders contributed extensively to the design of the user interface of the decision support application. They reviewed its presentation of outcome predictions and its usability. Their recommendations led to improvements in the wording and ordering of the questions, instructions, and display of predicted outcomes.

Methods

To develop KOMET predictive models for outcomes of TKR and of nonsurgical treatments, we created a consolidated database with treatment outcomes of knee osteoarthritis from a variety of clinical study and registry data. We selected model variables based on input from patients and clinicians about the best capture of important determinants of outcomes and measurements of the clinical outcomes as well as on variables' contributions to models' predictive performance. We incorporated these models into prototype decision support software and tested them with stakeholders, clinicians, and patients.

Selection of Data Sets and Description of Outcomes

To create the modeling database, we considered a range of knee osteoarthritis databases (briefly described below) as well as the scales used in these databases: the WOMAC (for pain) and the SF-12 (for functional status). We selected 3 of the databases (Multicenter Osteoarthritis Study [MOST], Osteoarthritis Initiative [OAI], and Canadian Osteoarthritis Research Program [CORP]) because they are large, well established, and publicly available epidemiological studies of knee osteoarthritis. The 2 additional databases are knee osteoarthritis registries (New England Baptist Hospital [NEBH] and Tufts Medical Center [TMC]) determined to have adequate cases and the required indexes, and that were available from collaborating organizations.

Multicenter Osteoarthritis Study27

MOST is an NIH-sponsored longitudinal, prospective, observational study of knee osteoarthritis in adults with osteoarthritis or at increased risk of developing osteoarthritis.27 The database includes a community-based sample of 3026 participants aged 50 to 79 years, with preexisting osteoarthritis or those at high risk for osteoarthritis based on weight, knee symptoms, or a history of knee injuries or operations. Approximately 60% are women, and 15% are African Americans. The cohort was followed for 84 months and the data were collected through clinical assessments, radiological studies, several measures and instruments, and telephone interviews. The study focused on mechanical risk factors, causes of knee symptoms and pain, and the long-term disease trajectory of knee osteoarthritis. Data used in this article were obtained from the MOST, available for public access at http://most.ucsf.edu.

Osteoarthritis Initiative28

The OAI is an NIH-sponsored multicenter, longitudinal, prospective observational study of osteoarthritis intended as a public domain research resource. Its database includes clinical evaluation data, radiological (x-ray and MRI) images, and a biospecimen repository for 4796 men and women aged 45 to 79 years who have, or are at high risk for developing, symptomatic knee osteoarthritis. Data used in this article were obtained from the OAI database available for public access at http://www.oai.ucsf.edu/.

Canadian Osteoarthritis Research Program29-31

The Women's College Hospital CORP data set includes 2200 participants of this prospective, population-based cohort with at least moderately severe knee osteoarthritis, aged 55 or older. Ultimately, because of the challenges with this data set, we did not use it for this project.

NEBH Orthopedic Surgery Registry32

The NEBH registry includes 2462 patients who have underwent TKR there since 2011. Assessments occur before surgery, at 6 weeks, and at 12 months. Data collected include demographic, vital signs, clinical measures, medications, knee examination, the Knee Society Score (KSS) pain and physical function score, the SF-12 health status score, surgical complications, and procedure outcomes. The mean age of patients is 68 years, and 57% are women.

TMC Orthopedic Surgery Registry33

The TMC registry includes 535 patients who had received TKR since 2007. Assessments occur before surgery, at 6 weeks, 12 months, and 24 months. Data collected include demographic, vital signs, clinical measures, medications, knee examination, pain and physical function (KSS), health status (SF-12), surgical complications, and procedure outcomes. The mean age of patients is 62 years, and 61% are women.

WOMAC Index34

The WOMAC, developed in 1982, is widely used in the evaluation of hip and knee osteoarthritis and is available in >100 languages. It is a self-administered questionnaire of 24 items, divided into 3 subscales: (1) pain (5 items) during walking, using stairs, in bed, sitting or lying, and standing upright; (2) stiffness (2 items) after first waking and later in the day; and (3) physical function (15 items) using stairs, rising from sitting, standing, bending, walking, getting in and out of a car, shopping, putting on and taking off socks, rising from bed, lying in bed, getting in and out of a bath, sitting, getting on and off the toilet, heavy domestic duties, and light domestic duties. We used the knee pain scale as the primary outcome in this project. In its raw form the WOMAC knee pain scale ranges from 0 to 20. To make it easier to interpret and represent in the final models, we rescaled it to 0 to 100, with 0 representing absence of pain and 100 representing extreme pain.

SF-12 Health Survey

The SF-12 is a multipurpose short form generic measure of health status.35,36 It was developed to be a much shorter, yet valid, alternative to the SF-36 for use in large surveys of general and specific populations and for large longitudinal studies of health outcomes. We used its physical functioning summary score as the second predicted outcome for this project. The SF-12 scores range from 0 to 100, with higher scores indicating better function.37

Evaluation of Registry Variables

We used a consensus process involving clinician investigators and stakeholders to select variables for model development. First, clinicians were asked to rank variables based on their impact on (1) predicting prognosis for pain or function, with or without surgery, and/or (2) predicting assignment to medical or surgical treatment (ie, indications or contraindications for treatment).

They a priori ranked each variable from A to D:

  1. Variables that almost certainly must be included in the model; eg, age
  2. Variables that would be desirable to have established risk factors for the outcome; eg, body mass index (BMI)
  3. Variables that would be desirable to have for exploratory analyses; eg, history of falls over the past 12 months
  4. Variables not likely to be needed; eg, family history of arthritis

Finally, a few variables were ranked by clinicians for importance and ease of collection using a scale of 1 to 10, with 10 being very important or very hard to collect. We collapsed the importance rankings into 3 categories: not at all important (1-3), fairly important (4-7), and very important (8-10). Clinicians ranked most of the variables as easy to collect. We included in the modeling database the final list of variables deemed as fairly important and very important.

Creating the Modeling Database

The database for creating KOMET models included 2 types of registries. Two databases, MOST and OAI, had data collected on knee osteoarthritis at fixed intervals per their protocols. During the course of follow-up, some patients had TKR and continued to be followed afterward. The 2 other registries, NEBH and TMC, were from hospitals that collected baseline and follow-up data only on their patients who had TKR.

For this project, our target sample was patients who had knee osteoarthritis and had reached the clinical stage at which they would be deciding whether to have TKR. Lacking a cohort of such patients randomized to the medical or surgical options, we used data from patients who had TKR and matched them to patients (knees) who did not have TKR but who had similar characteristics. Where possible, we matched non-TKR knees to TKR knees within the same database (OAI, MOST). We matched TKR knees from the NEBH and TMC registries to non-TKR knees from MOST and OAI based on the best match. In practice, we created a database in which we used the knee as the unit of analysis, and we conducted matching based on characteristics of the knee and the patient. Thereby, we created a study sample of patients who would or could be considering this therapeutic choice.

For the MOST and OAI registries, we identified all knees that underwent TKR and then designated the data collected at the closest previous visit as the baseline visit for that TKR. We then extracted baseline data on these TKR knees from the patients' registry data, including demographics, knee characteristics, comorbidities, mental and physical function, and other clinical features. To find non-TKR control knees, we created a subdatabase of all knee visits from all patients, excluding any that occurred after a TKR. We then used a greedy matching computer algorithm38 to select control knees for each TKR knee (within the same database, OAI or MOST). It should be noted that the variables used for matching differed among the databases, based on data availability. As a guide to determine variables to use for matching, we used input from research team clinicians, stakeholders, and the literature. For matching, we converted continuous variables to categories. We loosely based categories on Riddle et al, which presented an algorithm to judge the appropriateness of TKR.39 Our research team considered the factors used in that algorithm as reasonable factors to match on where possible. Categories were ordered, and we did not allow matches beyond 1 category of difference. We did not always require exact matches because we did not want to lose patients who had TKR from the model-building sample, and we could statistically adjust for differences between the TKR and non-TKR groups in the modeling process. Thereby, we matched each TKR knee in OAI with a similar non-TKR knee in OAI based on values of matching variables at baseline. The same was true for MOST.

Because the TMC and NEBH samples included only TKR participants, we drew their matched non-TKR controls from a pooled data set of knee visits from the OAI and MOST registries.

We established exclusion criteria based on discussions with the research team members and applied them before we performed modeling. We excluded any knee that did not have follow-up information (9 months to 5 years after the baseline visit or TKR) on the same knee in the same state (TKR vs non-TKR). If a knee visit was a candidate control but had TKR at some point between that visit and a follow-up at least 9 months later, we excluded it from the pool of non-TKR knees used for matching. If a knee had TKR but did not have pre-TKR baseline data within 12 months of the TKR, we excluded that TKR. If a knee had TKR, we excluded the contralateral knee from the pool of non-TKR knees used as controls. If a patient had TKR on 2 knees, >90 days apart, we excluded both knees; with an interval of >90 days, we were concerned that the 1-year evaluation of pain and function for the first knee could still be during the recovery period of the surgery for the second knee, which would confound the assessment of the outcome. If bilateral surgery was completed on 2 knees within 90 days of each other, we used the first knee or randomly chose 1 if both knees were completed on the same day. There was 1 exception in the MOST data for which 14 patients were counted twice, including and following each bilateral knee separately. In the full database, 104 other patients were counted 2 times (92 patients) or 3 times (12 patients). Overall, 1322 patients contributed data for 1452 matched knees for these analyses. We did allow single patients to contribute both a control and TKR knee when surgeries were far enough apart in time to allow full follow-up on each independently. We also allowed OAI and MOST control knees to be reused for the matching process for TKR knees from the NEBH and TMC registries. See Appendix A for details and limitations of this approach.

On the matched data set, we compared baseline characteristics between knees with and without TKR, using chi-square tests and t tests. To account for missing data, we used multiple imputation, creating 10 imputed data sets for each study source. We also compared baseline characteristics on imputed data sets as we used these for model development. We adjusted P values from the analysis of the multiple imputation data set to account for imputation variability.40 We used SAS software for these analyses using the model information (MI) procedure to impute the data and MIANALYZE to process the results of analyses on the imputed data. See Appendix B.41

Creating Predictive Models for Outcomes

We conducted analyses using SAS for Windows (Version 9.4 TS Level 1M2; SAS Institute, 2002-2012) and SAS Enterprise Guide (Version 7.13 HF3; SAS Institute, 2016).

We developed a multivariable linear regression model to predict the 1-year knee pain outcome based on the WOMAC score or, when a database lacked WOMAC items, using an estimated WOMAC score, as described in Appendix C. Our approach was to develop the model using a set of matched TKR to non-TKR knees from the OAI database and then to validate/test it on a set of matched TKR to non-TKR knees from the MOST database. We then pooled the OAI and MOST data sets and built a new model, starting with variables used in the model developed in the OAI data and tested on the MOST data. We also rederived models on a database that included all 4 data sets (OAI, MOST, NEBH, and TMC). We used a similar variable selection process but with a more limited set of candidate predictor variables because NEBH and TMC did not capture as many variables as the OAI and MOST registries. We repeated this entire process for the functional outcome (SF-12 physical component at 1 year). To create models that could provide predicted estimates of 1-year knee pain and 1-year function, with and without TKR, for any patient based on their characteristics, all models included an indicator variable for treatment type. We explored covariates and interactions of treatment type with covariates in the different phases of the modeling process. We did not adjust for matching in the linear regression during modeling because the purpose of matching was to create a reasonably balanced study sample, and covariates in the models could account for remaining imbalances between groups.42 We describe further details of our approach in Appendix D.

Prototype Decision Support Software Development, Interface Design, and Usability Testing

The goal of software development and usability testing was to translate the results of the predictive models into easily understood, patient-specific reports with predictions of 1-year outcomes that could be produced in real time in the course of clinical care, for shared treatment decision-making and, if appropriate, enrollment into an RCT.

Decision Support Software Development

There were 2 KOMET development tasks, for the analytics and for the user interface. Analytics development included implementing the predictive models as reusable, multiplatform software components to generate both the current and 1-year predicted pain and function outcomes for nonsurgical and surgical treatments. In addition, the analytics software calculated the respective 95% CIs around each prediction as the basis for considering the degree of overlap that would suggest near equivalence, or equipoise. User interface development included creating a web browser-based questionnaire interface to collect patient demographics, items for computing the WOMAC pain score, the SF-12 physical functioning scale, and comorbidities. Together, the user interface and analytics component included methods for data retention and presentation of the predicted outcome results. We then incorporated the predictive models into the web-based decision support application for iterative user testing.

Interface Design

The user interface design process involved iterative prototyping of methods to collect data for the predictive models, displaying the predictions through data tables, bar charts, data plots, dynamic text descriptions, and printed reports, and determining and alerting users about mathematical equipoise. We began with image mockups and storyboards, then used online prototyping tools (www.axshare.com) to establish page layout, content placement, and workflow. Once we identified key user interface elements, we finalized general layout and content placement and conducted subsequent user interface design iterations on a live website. We implemented the analytics components and user interface on a stand-alone web-based application server using an Apache.org Tomcat 8 webserver (Apache Software Foundation, 1999-2019.50

Usability Testing

We tested the prototype decision support application and iteratively redesigned it to address patient and clinician user needs. We conducted initial testing with 12 research institute staff members as well as members of our patient and clinician stakeholder panels. We tested the final design with 10 patients and 6 physicians in 3 clinical settings during typical clinic and research-specific visits. Testing included (1) entering demographic data and completing questionnaires to provide the information needed for the predictive models, (2) interpreting predictive model results through data displays, and (3) determining user understanding of the predictive models and mathematical equipoise and clinical trial randomization through case-based discussions. Usability testing included a “think-aloud” protocol and a usability testing script, as described in Appendix E.

A research assistant and the project director conducted testing. All sessions were recorded and transcribed. Testing with research institute staff and stakeholders was conducted virtually or in a conference room, and testing with patients and clinicians was conducted in the clinic setting. The IRB determined the project was exempt from IRB review.

Results

Study Design and Database Creation

The final database included 1452 knees (726 with TKR and 726 without) of 1322 patients. Of patients, 91% (1204) had a single knee included in the database, 8% (106) had 2 knees used or a single knee used 2 times, and 1% (12) had knees used 3 times. We matched TKR knees from OAI to control knees from OAI, and we matched TKR knees from MOST to controls from MOST. Because NEBH and TMC included only TKR knees, we drew the controls for those databases from non-TKR knees from OAI and MOST. In the final matched database, the relative contributions of TKR knees were OAI, 252; MOST, 154; NEBH, 248; and TMC, 72. For the control knees, contributions were OAI, 472, and MOST, 254. Figure 1 and Appendix F: Figures 1a to 1d provide breakdowns of how we selected the final analysis sample from each database in CONSORT-type figures.

Figure 1. Description of Final Analysis Sample Selection.

Figure 1

Description of Final Analysis Sample Selection.

Study Sample

We compared distributions of variables used for the matching process between TKR and non-TKR knees for each data source; these results are presented in Appendix F: Table 1a. They confirmed that the matching algorithm had worked. In each database, characteristics used for matching were well balanced between the TKR and non-TKR knees. Baseline characteristics considered for the modeling process, and of interest to clinicians and stakeholders, were comparable between TKR and non-TKR knees, as presented in Appendix F: Table 1b. This also was true of the variables used in the final multivariable models using the imputed data, as shown in Appendix F: Table 1c.

Baseline characteristics and outcomes at follow-up of the matched study sample are summarized in Table 1. Approximately 40% were men, the mean age was 65 years, and the mean BMI was 31. On the 0 to 100 pain scale (100 indicating extreme pain), the mean baseline knee pain was significantly higher in the TKR group than in the non-TRK group (mean, 45.6 vs 40.5; P ≤ .01), despite efforts to match on this variable (categorized). Comparisons of mean baseline SF-12 scores between TKR and non-TKR groups showed better physical and mental function in the non-TKR groups than in the TKR groups, with the difference being significant for physical function (mean, 37.2 vs 38.6; P = .008). Overall, at follow-up there was less knee pain and better physical function in the TKR groups than in the non-TKR groups. Irrespective of significance, we used all variables listed in Table 1 in building the multivariable models of long-term (approximately 1-year) outcomes.

Table 1. Description of Pooled Study Sample Used for Model Derivation for n = 1462 Matched Knees (Imputed Data).

Table 1

Description of Pooled Study Sample Used for Model Derivation for n = 1462 Matched Knees (Imputed Data).

Model Development

We used linear regression to model the 2 outcomes, the WOMAC knee pain scale (rescaled 0-100; see Appendix C: WOMAC Knee Pain, Part II) and the SF-12 physical functioning component score.

Based on the methods described above, we chose these outcomes (including timing), before building models, following repeated discussions with clinician and patient stakeholders and the research team. We chose 1 year as the target follow-up time to have a time point beyond the recovery time from surgery, estimated as up to 9 months. Stakeholders felt benefits of surgery were stable beyond that time point. To address inconsistencies and gaps, we allowed for use data from up to 5 years past baseline in which there was no closer time to 1 year for a knee.

Stakeholders were strongly supportive of using both the pain and functional outcomes in patients' decision-making processes, although the outcomes were not of equal importance to all patient stakeholders. As the project progressed, the team continued to receive more input from patient and clinician stakeholders, which influenced modeling, an example of which is described in Appendix G.

Models Built on OAI Database and Tested on MOST Database (Appendix H: Tables 2a and 2b)

We tested the models built on the OAI database on the MOST database to check that the statistical modeling had been effective, as reflected on an independent data set. The first model built was for WOMAC knee pain at 1 year and used the matched OAI database that included 252 knees that underwent TKR and 252 knees that did not, using all knees for which there were WOMAC knee pain data available for the 1-year end point. The final model, built on the imputed data sets, included main effects for younger ages (defined as <60 years old) and a measure of body pain based on data collected from a homunculus in which locations of pain could be indicated by patients and a calculation could be made that measured the percentage of sites on a diagram of a body that had symptoms, hip pain (yes vs no), baseline WOMAC knee pain, and treatment (TKR or not). The model also included interactions of TKR with both baseline knee pain and hip pain. The model r-square was 0.36 for WOMAC knee pain. We applied the coefficients from the OAI model to the imputed MOST data set and compared the resulting fitted values for 1-year knee pain with the observed 1-year knee pain values. There was a positive association between observed and fitted values (r-square = 0.32). We conducted a similar analysis for the 1-year physical functioning outcome. The model for the 1-year functional outcome built on the OAI data included main effects for gender, age, baseline SF-12 mental and physical components, homunculus, hip pain, depression score, and baseline knee pain in the contralateral knee. There was also a main effect for treatment and no significant interactions of treatment with any other variables. The model indicated that, on average, the 1-year physical function score (SF-12 physical component score) was 3.4 points higher for patients who had TKR than those who had not. This OAI model had an r-square of 0.42. When we applied this model to the MOST data set, the fitted values for the physical function outcome were positively associated with the observed results (some of which were imputed), although the r-square on the MOST data dropped to 0.18. While the decline in performance was not what we wished for, the research team still decided to combine the 2 databases and try to refit the model on the pooled data, with the objective that with the large sample size, a better model could be constructed.

Models for 1-Year Knee Pain Built on Pooled Databases (Appendix H: Tables 2a-2b)

We built multivariable models on versions of the databases that included imputed values for 1-year pain outcome. We constructed the 1-year knee pain models on the combined OAI and MOST data sets (P1 model) and on the combined OAI, MOST, NEBH, and TMC (P2 model) databases. The 2 models included terms for a treatment indicator variable and for baseline knee pain and an interaction of these 2 and had similar r-square values (0.32), suggesting equivalent performance. In both models, the expected knee pain at 1 year was less for patients who had TKR than for those who did not have TKR, with the difference being greater in those who had higher knee pain levels at the start.

The P1 model also indicated worse knee pain at 1 year with younger age, more knee pain at baseline in the contralateral knee, more total body pain (on the homunculus), and higher BMI. There was also an interaction with baseline hip pain for which the benefit of TKR vs non-TKR in knee pain reduction was greater in patients who had baseline hip pain vs those who did not.

Some of the variables available in the OAI and MOST data sets were not available in the other databases (eg, pain indicated on a homunculus, pain in contralateral knee), and some variables, such as hip pain, had not been collected for the surgery databases (NEBH, TMC) in the same way as for the OAI and MOST databases. Accordingly, we did not use these variables in modeling in the larger database. The final P2 model included age as a continuous variable, with more expected knee pain at younger ages, as was seen in the P1 model. The model also included baseline SF-12 scores with less expected knee pain at 1 year, with higher baseline physical component scores and mental component scores.

Models for 1-Year Physical Function Built on Pooled Databases (Appendix H: Tables 2a and 2b)

The model-building process for the 1-year physical functioning models (F1, F2) was similar to the 1-year pain models. Again, we built the F1 model on data from OAI and MOST that included many possible predictor variables. We built the F2 model on a larger database that included the same OAI and MOST data as well as data from the NEBH and TMC cohorts. This larger data set, however, included fewer predictor variables common to all 4 data sets. The final physical function models are presented in Appendix H: Table 2b. The 2 models had similar r-square values (0.34, 0.35). Both indicated better 1-year physical function for males, younger patients, higher initial physical and mental component scores, and lower BMI. The F1 model also included a main effect for baseline knee pain in the contralateral leg, with more baseline pain being associated with a worse 1-year physical function outcome. The F2 model also included interaction terms of TKR treatment with both age and the SF-12 mental score. Results from the model indicate that the estimated benefit in function at 1 year for patients treated with TKR vs standard of care is greater for younger patients and for patients with lower baseline mental health scores. The F1 and F2 models are presented in Appendix H: Tables 2a and 2b.

Summary of Multivariable Models (Table 2, Figure 2, and Appendix H: Table 2c): Appendix H

Table 2c shows a summary of variables included in all 4 final models (P1, P2, F1, F2) and the distribution of each variable in the pooled databases. In the earlier phases of this project, we hoped our P1 and F1 models would have better performance because we had a larger pool of variables (although fewer patients) to use for the modeling process. As the project evolved, the research team realized that many of the variables under consideration were burdensome to collect and/or difficult to capture consistently. In the end we decided to use only models P2 and F2—which we built on the data sets that had more patients (OAI, MOST, NEBC, TMC) but fewer independent variables—for the development of the software. The coefficients for these models are presented in Table 2. Although neither model was validated in an independent database, we believe the models have sufficient performance, based on variables consistent with clinical understanding and importance such that they are reasonable for use in this demonstration project. Based on the results of testing our OAI model on the MOST data, we are optimistic the models can be useful in patients similar to those used to develop the models. These patients, who are presumably at the point of deciding whether to have TKR, have characteristics similar to those shown in Table 1.

Table 2. Final Models for 1-Year Knee Pain (P2) and SF-12 Physical Function (F2).

Table 2

Final Models for 1-Year Knee Pain (P2) and SF-12 Physical Function (F2).

We used these models to estimate 1-year knee pain and physical functioning for the treatment each participant actually underwent (TKR or non-TKR) and also for their counterfactual situation, as if they received the alternative treatment. In other words, we calculated 2 predicted values for each participant in our database (that we used to make our models). One prediction assumed participants received TKR and the other prediction assumed they did not. These data allowed us to predict the difference in pain and function outcomes for each patient under 2 courses of treatment (TKR vs non-TKR). The distribution of predicted differences in pain and function with and without TKR is shown in Figure 2. The figure shows that there was a range of predicted improvement with TKR, and those patients predicted to have benefit in knee pain may not have been the same as those for whom benefit in physical functioning is predicted. In this project's database, 9% of participants had a predicted gain in function of TKR vs non-TKR of at least 8 SF-12 physical function points and a predicted reduction in knee pain of at least 20 points (on WOMAC scale of 0-100). At the other end of the spectrum, 6% had predicted gains in physical function of <4 points and reduction of knee pain of <10 points. Only 2% had larger gains in physical function and smaller improvements in pain. Figure 2 also shows sample participants from each of the 9 combinations of estimated knee pain and physical function change. Examples of participants with the most-, mid-, and least-estimated reduction of pain as well as their gain in function with 95% prediction intervals for the estimates are shown in Table 3. Participants with higher baseline knee pain had the largest predicted reductions in knee pain with TKR vs non-TKR. Younger patients with lower SF-12 scores had the largest predicted benefits in physical function with TKR vs not having TKR. These differences in estimated benefits between participants are because of the interaction terms included in the multivariable models.

Figure 2. Mosaic Plot Showing Distribution of Predicted Differences (TKR vs Non-TKR) for 1-Year Knee Pain and SF-12 Physical Function in Pooled Data (n = 1452).

Figure 2

Mosaic Plot Showing Distribution of Predicted Differences (TKR vs Non-TKR) for 1-Year Knee Pain and SF-12 Physical Function in Pooled Data (n = 1452).

Table 3. Estimated Outcomes for a Sample of Cases.

Table 3

Estimated Outcomes for a Sample of Cases.

We ran into statistical questions regarding the use of the proposed linear model with 1-year outcomes, specifically knee pain, for which the scores do not have normal distributions and adjustment for covariates still produced a model in which the resulting residuals (the difference in predicted and observed values) still had skewed distributions. We explored alternative nonlinear models with little gain in model performance and ultimately used the linear form of the model. See Appendix I.

Prototype Decision Support Software Development, Interface Design, and Usability Testing

The KOMET development process resulted in the creation of 1 web-based application for clinicians (http://medicalequipoise.com/tkrclinician) and 1 for patients (http://medicalequipoise.com/tkrpatient). Both applications are composed of an analytics software library that also could be embedded into an EHR system.

The applications underwent user testing to assess the ease of data collection through the web-based questionnaire and users' ability to understand the outcome predictions when presented in data tables, graphs, and as dynamic text. We also tested depictions of prediction uncertainty and definitions of mathematical equipoise.

All users were able to easily enter demographic data and complete the questionnaire with only minor questions or comments. We initially presented users with a table and bar graphs describing current and predicted pain and function outcomes (Appendix J: Figure 1). After initial testing, we refined the report to provide a dynamic text description (Appendix J: Figure 2). This change improved users' ability to identify their current pain and function scores and the predicted 1-year outcome scores with surgical and nonsurgical treatments.

The combined pain and function plot proved to be less intuitive. Many users immediately understood that the single data point represented both the pain and function outcome predictions, but others struggled to describe the data represented by the graph (Appendix J: Figure 3).

User testing led to improvements in the way predicted outcome uncertainty was communicated. The degree of uncertainty around the predicted pain and function outcomes, initially represented by whiskers on the bar chart (Appendix J: Figure 1), was not understood by users. We changed the chart by using shading within the bar that faded at the edges and added a dynamic text explanation describing the range of possible values. (Appendix J: Figure 2). This improved user understanding. Analogously, for the combined pain and function plot, we changed the uncertainty around the prediction from a dotted circle around a data point (Appendix J: Figure 3) to a shaded circle (Appendix J: Figure 4). A limitation of our depiction of the results, not of the interface per se, is that our methodology made separate statistical models for 1-year knee pain and physical function; in reality, these 2 outcomes are likely related. Therefore, our uncertainty regions may still not be accurately capturing, and are likely overestimating, the joint-prediction areas. The true uncertainty region would be a subset of the circle if pain and functioning were dependent.

Based on these uncertainty estimates regarding the predictions and based on the mathematical equipoise approach, we used KOMET to identify patients for whom enrollment in a randomized clinical trial might be appropriate. For the purpose of demonstration, we defined mathematical equipoise as a condition when pain and functioning outcome predictions with nonsurgical care and TKR are relatively close and fall within each other's circles of zones of uncertainty—that is, their circles of uncertainty overlap. These circles are created when the pain and function outcome predictions are presented as point estimates on a 2-dimensional graph with pain on the vertical axis (y) and function on the horizontal axis (x). The uncertainty circle is defined by the shaded area extending around each of the point estimates and represents the uncertainty associated with the predictions. In Appendix J: Figure 1, the blue diamond represents the outcome prediction point estimate for nonsurgical care and the green circle represents the point estimate for TKR. The large shaded blue and green overlapping circles are around the 95% confidence intervals of the pain and function point estimates and represent the uncertainty associated with the predictions. When we computed the mathematical distance between the nonsurgical and TKR predictions the resulting distance between the 2 coordinates on the pain and function graph was 43. Empirically, based on the input of rheumatologists, orthopedists, and primary care clinician stakeholders, and after reviewing the 95% CI for a sample of cases, we selected the distance of less than or equal to 20 to flag the possible presence of equipoise during the usability testing.

When mathematical equipoise was present, an alert appeared on the user interface's results page and a patient contact and screening form was made available to the clinician. The form could be used to begin the clinical trial recruitment process.

We asked participants about usefulness of the information for decision-making. Each stated that the tool was helpful or somewhat helpful. All wanted to discuss the results with their physicians.

We used the combined pain and function predicted outcomes plot to discuss the idea of equipoise in the context of random assignment of treatments in an RCT. We showed users 3 sample graphs depicting the predicted outcomes of the 2 potential treatments, with small, moderate, and large amounts of overlap between the 2 circles that depicted uncertainty around the predicted point value (see Appendix L). We assumed that if patients perceived greater overlap in the predicted outcomes between the 2 treatments, then they would be more likely to consider being randomized to 1 of the treatments. Only 7 of 12 users shown these scenarios responded that they understood the concept of equipoise. Because of their personal preferences, some users rejected the option of surgery despite predictions suggesting dramatic reductions in pain and improvement in physical functioning. Other patients indicated they would consider randomization only if the burden of surgery promised a far better outcome than nonsurgical treatment. Overall, users did not respond to the depiction of the circles of uncertainty scenarios as we expected.

We conducted patient and clinician user testing to understand KOMET use in the clinical setting during regularly scheduled clinic visits and research-specific visits. There were significant challenges in allocating adequate time for the patient to complete the decision support tool and for the clinician and patient to discuss the results and implications for decision-making. We determined that future dissemination should include patients completing the tool before their visit to allow the patient and clinician more time during the visit to discuss the results, the patient's priorities and choices, and treatment options.

Through the efforts of our research team, stakeholders, and design consultants, we were able to develop a software program that users found helpful in shared clinical decision-making. Although the final prototype seemed attractive and easy to use, there will need to be further refinements for routine clinical care use and enrollment into RCTs.

Discussion

Study Results in Context

In deciding between treatment options and deciding whether to participate in a clinical study, the patient is the ultimate decision maker. Ideally, these determinations will be made with ample consultation and support by relevant clinicians. In this context, methods to share information and support a shared conversation about these decisions can be very helpful. Decision aids explicitly intended for shared patient-clinician decision-making have been shown to improve patient knowledge, patients' satisfaction with decision-making, and agreement between choices for treatment and their health outcome preferences, among other positive effects.43 The same kind of shared decision-making is justified in the decision as to whether to participate in an RCT. In developing KOMET, we sought to develop decision support that could support both a clinical decision and a decision to participate in an RCT—in this case, the decision between surgical (TKR) and nonsurgical treatment of knee osteoarthritis.

There are 2 general contexts for the results of this project: (1) the development of mathematical equipoise as a basis for decision support and (2) the state of evidence for treatment decisions for knee osteoarthritis. We discussed the latter of these 2 in the introduction; although knee replacement surgery for osteoarthritis is very common in this country, until a relatively small RCT was conducted as this project was being completed there were no RCT data to directly inform this treatment choice.9 Although KOMET does not provide new data, it presents those data available at the time of its development in a potentially helpful way. More to the general point of this project, KOMET is intended to help generate the needed RCT data for knee osteoarthritis to add to extant evidence. Thus, the main context for the results of this project is for the development of the mathematical equipoise method.

Mathematical equipoise is based on the use of mathematical models that serve as clinical predictive instruments to predict patient-specific outcomes of treatment options, which then can be compared. By doing so, in a sense we are discerning patient-specific equipoise. When the predictions are not discernibly or importantly different, which can suggest equipoise between options, enrollment in an RCT that compares the treatments can be considered. When the predictions suggest one treatment is likely to have better outcomes then trial enrollment would not be appropriate. When this is the case, however, this identification of a potentially superior treatment can inform patient-clinician decision-making, thereby constituting an approach for enrolling RCT participants that also supports clinical decision-making for those not to be enrolled in an RCT.

Our original examples of this approach used predictive model outcomes of acute myocardial infarction that were built using RCT data, which are ideal sources of data for making predictive models. However, for many treatments there are no prior RCTs—and indeed these are the very conditions for which RCTs and, in particular, clinical effectiveness trials are needed.

For the widespread use of mathematical equipoise to help fill in gaps in RCTs, predictive models for these conditions will need to be built on data from clinical registries, EHRs, and other non-RCT data. Therefore, the purpose of this project was to further develop the method by applying it to an important clinical treatment question for which there were essentially no prior RCTs. With >680 886 TKRs done each year in the United States for knee osteoarthritis,44 we considered this an important question for patients and society, and a good opportunity to test whether this approach could work in this challenging but common situation.

To do this, we created a consolidated database from non-RCT sources on knee osteoarthritis outcomes on which we created predictive models of the outcomes of surgical knee replacement and nonsurgical treatments. The choices for variables for these models were informed by the needs of stakeholders who would use decision support for knee osteoarthritis, with focus on their views on the representation of pain and functional outcomes. We then developed multivariable mathematical models that predict patient-specific outcomes of surgical and nonsurgical treatment, using statistical and analytic methods to adjust, to the extent possible, for the inherent biases in the databases. We also performed a variety of analyses to understand how to best model and represent the predicted outcomes. We incorporated these models into a stakeholder-informed prototype decision support software for potential incorporation into EHRs. KOMET exemplifies a tool that can be used to provide shared decision support for RCT inclusion and clinical care that is responsive to the perspectives and needs of patients and clinicians in supporting shared decision-making for RCT enrollment and treatment.

We believe the impact of such a method on the field of CER could be substantial. The impact of CER is based on evidence generation, which leads to evidence synthesis, interpretation, application, dissemination, implementation in widespread practice, and then feedback for the generation of new evidence. This entire chain rests on having unbiased generalizable, ideally RCT, evidence. Were there such a method for patient-centered enrollment into RCTs that could be incorporated into EHRs, far more targeted comparative effectiveness trials could be conducted, more diverse clinical sites could be included, and more representative patients could be enrolled. This would lead to results that are applicable to more special groups and to more care settings. This would also facilitate clinicians' and the public's understanding of, and enrolling into, clinical trials, which could help improve the public's engagement in the biomedical research enterprise. Additionally, clinical trial duration, a scientifically and financially important component of drug development pipeline time, could be much shorter. If instead of only 10% of eligible patients being enrolled enrollment pace was increased by up to tenfold, trials would finish much faster. All these advantages should result in better clinical trials and greater impact on the public's health.

If successful in demonstrating that this method has applicability to the many important conditions for which RCTs have not yet been conducted, providing onsite real-time decision support for trial enrollment, it could transform how comparative effectiveness research could be conducted across the spectrum of health care. This would address the failure of current clinical trial approaches to enroll sufficient numbers of patients, facilitate the need to identify and ethically handle treatment of all potential participants, and engage a conversation between clinician and patients based on data specific to that patient. Thereby, it could have use in broad areas of clinical care and could help enable the great promise of CER in improving clinical care.

Uptake of Study Results

In this project, based on input from multiple stakeholders and potential users, we implemented prototype KOMET software and tested it in clinical settings. Although it is not ready for widespread implementation in its user interface, content, and connectability to EHRs, it did function as intended and thus is an important step in the ultimate goal of clinical use. We believe its promise is sufficient to warrant further development with the explicit intent of being a tool that can be implemented in clinical settings and linked to EHRs, to serve both treatment and RCT enrollment purposes. Toward that end, we will seek further opportunities to move toward that goal.

Study Limitations

Although largely accomplishing its intended objectives, as an early stage in the development of mathematical equipoise for shared clinical decision-making, this project has many limitations related to the available data, the modeling methods, the model variables, and the prototype software.

An important limitation of our approach is that the models were created on potentially biased data. Although we sought data from studies that had both surgical and medical treatment of knee osteoarthritis, 2 of our studies fit that requirement while 2 other registries were of only 1 treatment (surgery). Both types of sources provide challenges for creating comparable patients who underwent the 2 treatments, which is needed to make accurate models of the 2 treatments. In contrast, our prior clinical predictive models, including the first examples of mathematical equipoise, used data from RCTs. This allowed for representation of the alternative treatment effects based on comparable samples undergoing the treatment choices, providing confidence that the effects and outcomes represented by the multivariable models would reflect the actual treatment effects and not differences in the underlying characteristics of patients receiving the alternative treatments. However, for mathematical equipoise to serve its intended purpose of facilitating RCTs of treatments for which none have yet been conducted, its models will need to be made on non-RCT data. Therefore, for this project, we intentionally chose a condition for which our only data were from registries that posed challenges for making models that could be based on comparable patients receiving the 2 treatments. We undertook many checks to maximize the comparability and to accurately represent effects despite the likely biased samples. For example, we chose to use matching for our study design but acknowledge that while 1:1 matching improves control of confounding and enabled us to create a hypothetical sample of non-TKR patients who could be considered as TKR eligible, this approach does not use data from all available participants and may therefore have the cost of less precision. While we believe KOMET models have very good performance, despite the challenge of the available data, additional sources of data for this approach should be developed.

The modeling methods also have limitations. Although such multivariable regression as we used have advantages over some more computer-intensive methods like those used for machine learning, including the clearly interpretable variable coefficients and robustness that is more resistant to overfitting than some computer-driven methods,45 larger databases on which more corrections might be made (eg, via the use of propensity scores) and newer computer methods might advance the level of models that might be created. Indeed, we believe this approach will benefit greatly from such advances in modeling.

Another limitation is that neither model was validated in an independent database; we simply did not have sufficient data to support model development and to still have enough for a test database. However, we believe the performance of the models and their variables are consistent with clinical understanding and importance and are reasonable for use in this demonstration project. Nonetheless, testing on an independent data set will be an important future priority.

Beyond the methods, the modeling variables we used have limitations. While in general, based on the collection of important variables in the available databases and based on published clinical evidence and input by stakeholders, we believe we used very credible variables to represent independent and dependent (treatment outcome) variables, there is one about which we have reservations. The functional outcome we used, based on our wish to capture a holistic physical function of the patient, was based on the SF-12 functional scale. In looking at the results of KOMET predictions, we noticed that pain is often substantially changed by surgery but function tends to have a relatively modest improvement. In discussing this with patients, we wondered if we would have better captured their meaningful knee functional improvement if we had used a more knee-specific function rather than overall physical functioning. We hoped to address this limitation by performing further analyses of the treatment outcomes for the subsample of patients in our databases for whom we have a knee-specific functional scale, the KSS, as a treatment outcome. The results of these exploratory analyses, presented in Appendix M, suggest that the WOMAC knee pain tracks well with other measures of knee pain and symptoms and, in particular, Knee injury and Osteoarthritis Outcome Score (KOOS) knee pain. The SF-12 physical component score, while positively correlated, does not track as strongly with other knee-related quality of life and function variables. These results are somewhat to be expected in that, while there may be overlap in physical function and knee-related function, they are not the same thing. Our meetings with stakeholders suggest both overall and knee-related function are important, and we have come to believe future work to develop predictions of the more specific knee-related function would be useful to both patient and clinical stakeholders.

Certainly, as a prototype the KOMET software has limitations. The creation of full-featured, user-friendly, robust software is beyond the scope of this project. Our prototype has significant distances to go in these and other dimensions before it could be used in routine care. Nonetheless, we believe it is quite attractive and functional and, in the context of its intended role in this project, a successful product of this project.

Finally, in putting the use of KOMET in the context of clinical decision-making, this approach does not consider how the patient might feel about the outcome states (pain and function). This would involve translating the WOMAC and SF-12 scores into familiar terms for patients and making sure the idea of overlap of predictions that suggests equivalence are all understood. Also, it would include ensuring that these features are readily incorporated into patients' understanding and decision-making for their own and shared consideration. Beyond these user issues, as additional information, patients would have to know about the downsides and potential complications of surgery, delays of surgery, and adverse consequences of other treatments. Thus, while KOMET provides an important foundation for the shared decision-making process, to provide complete and optimal support additional work is needed in many dimensions.

Future Research

The limitations listed above suggest areas for future research. Approaches must be developed that lessen the biases inherent in clinical registry data. Although having more data, such as might be obtained from EHR data warehouses and other wide sources of clinical data, will not eliminate biases, finding ways to mitigate the biases using selection and sampling methods and other approaches will be extremely important for work on mathematical equipoise, as well as for many other efforts to harvest clinically important insights from clinical data. Beyond developing such methods, validation of these approaches will be crucial.

In future efforts of this type, we would like to have more complete accounting for ancillary issues and complications. For example, in the 1 RCT done to date,9 serious adverse events were more common in the TKR group than in the nonsurgical treatment group (8 vs 1 involving the index knee [P = .05] and 24 vs 6 overall [P = .005]), with the 2 most common serious adverse events involving the index knee having deep venous thrombosis (in 3 patients) and stiffness requiring brisement force (in 3 patients). Unfortunately, we did not have access to such information in the databases available to us. In this Danish study, 9 adverse events that occurred before the 12-month follow-up were identified in hospital records, by self-report at follow-up visits, and by the physiotherapist and were then categorized. In future work exploring mathematical equipoise, in an analogous way, we intend to methodically collect such data.

Modeling clinical outcomes based on data is evolving rapidly, and increasingly sophisticated computer-based methods, such as artificial intelligence and machine learning, are being applied to analysis of clinical data. Although computer-based algorithms have a tendency to overfit,45 compromising their generalizability to new populations, methods are advancing and an investigation of best methods is certainly warranted.

As indicated above, we believe the functional outcome we used for physical function might benefit from being a more knee-specific outcome variable.46,47 There are examples in other diseases in which, for specific conditions, disease-specific outcomes are more useful than more general functional outcomes, such as we used.48,49 We believe additional research that uses a more specific functional outcome would be worth conducting.

In terms of the prototype software, it is clear more research is needed for this and similar decision support that provides full-featured, user-friendly, interoperable, robust software. Badly needed will be attractive and functional software for this and similar purposes.

Finally, in developing and testing such decision support software, we will need to further investigate how to better understand, make clear, and use the patient-specific determination of equipoise that could be the basis of a comparative effectiveness RCT. We believe the method has important advantages for such studies, but before it can be widely deployed and used it must be fully understood and transparent to all stakeholders. We look forward to advancing this work.

Conclusions

This project demonstrated the use of predictive instruments and mathematical equipoise as a way to discern patient-specific equipoise and thereby as a method for providing patient-specific decision support for shared patient-physician decision-making for the selection between alternative treatments and as the basis for enrollment into comparative effectiveness trials. Based on its predictive models, KOMET provides individualized predictions of pain and functional outcomes of medical and surgical treatment of knee osteoarthritis designed to be embedded in EHRs. It can help identify patients for whom one or the other treatment seems likely to yield better outcomes based on their specific characteristics as well as patients for whom there is insufficient evidence to favor one treatment. This still can be part of a shared decision-making process that incorporates the patient's preferences and priorities for the outcomes the models predict (ie, pain and function but not others), and, by identifying potential clinical equipoise, it also can support enrollment into an RCT.

The next step will be to conduct a larger-scale test and then to implement it for its intended use—the conduct of a comparative effectiveness trial in usual care settings in which KOMET would support patient-clinician shared decision-making about treatment selection for knee osteoarthritis.

References

1.
Lawrence RC, Felson DT, Helmick CG, et al. Estimates of the prevalence of arthritis and other rheumatic conditions in the United States. Part II. Arthritis Rheum. 2008;58(1):26-35. [PMC free article: PMC3266664] [PubMed: 18163497]
2.
Guccione AA, Felson DT, Anderson JJ, et al. The effects of specific medical conditions on the functional limitations of elders in the Framingham Study. Am J Public Health. 1994;84(3):351-358. [PMC free article: PMC1614827] [PubMed: 8129049]
3.
Mankin, HJ. Clinical features of osteoarthritis. In: Kelly WN HE, Ruddy S, Sledge CB, eds. Textbook of Rheumatology. 4th ed. Philadelphia, PA: W.B. Saunders Co; 1993:1374-1384.
4.
The Incidence and Prevalence Database for Procedures. Sunnyvale, CA: Timely Data Resources; 1995.
5.
Kosorok MR, Omenn GS, Diehr P, Koepsell TD, Patrick DL. Restricted activity days among older adults. Am J Public Health. 1992;82(9):1263-1267. [PMC free article: PMC1694318] [PubMed: 1503169]
6.
Kramer JS, Yelin EH, Epstein WV. Social and economic impacts of four musculoskeletal conditions. A study using national community-based data. Arthritis Rheum. 1983;26(7):901-907. [PubMed: 6223644]
7.
Selten EM, Vriezekolk JE, Geenen R, et al. Reasons for treatment choices in knee and hip osteoarthritis: a qualitative study. Arthritis Care Res. 2016;68(9):1260-1267. [PubMed: 26814831]
8.
Weng HH, Kaplan RM, Boscardin WJ, et al. Development of a decision aid to address racial disparities in utilization of knee replacement surgery. Arthritis Rheum. 2007;57(4):568-575. [PubMed: 17471558]
9.
Skou ST, Roos EM, Laursen MB, et al. A randomized, controlled trial of total knee replacement. New Engl J Med. 2015;373(17):1597-1606. [PubMed: 26488691]
10.
Eyles JP, Mills K, Lucas BR, et al. Can we predict those with osteoarthritis who will worsen following a chronic disease management program? Arthritis Care Res. 2016;68(9):1268-1277. [PubMed: 26749177]
11.
Fagerlin A, Sepucha KR, Couper MP, Levin CA, Singer E, Zikmund-Fisher BJ. Patients' knowledge about 9 common health conditions: the DECISIONS survey. Med Decis Making. 2010;30(suppl 5):35S-52S. [PubMed: 20881153]
12.
Kent DM, Ruthazer R, Griffith JL, et al. A percutaneous coronary intervention-thrombolytic predictive instrument to assist choosing between immediate thrombolytic therapy versus delayed primary percutaneous coronary intervention for acute myocardial infarction. Am J Cardiol. 2008;101(6):790-795. [PubMed: 18328842]
13.
Selker HP, Beshansky JR, Griffith JL, et al. Use of the acute cardiac ischemia time-insensitive predictive instrument (ACI-TIPI) to assist with triage of patients with chest pain or other symptoms suggestive of acute cardiac ischemia. A multicenter, controlled clinical trial. Ann Intern Med. 1998;129(11):845-855. [PubMed: 9867725]
14.
Selker HP, Griffith JL, Beshansky JR, et al. Patient-specific predictions of outcomes in myocardial infarction for real-time emergency use: a thrombolytic predictive instrument. Ann Intern Med. 1997;127(7):538-556. [PubMed: 9313022]
15.
Selker HP, Beshansky JR, Griffith JL, Investigators TPIT. Use of the electrocardiograph-based thrombolytic predictive instrument to assist thrombolytic and reperfusion therapy for acute myocardial infarction. A multicenter, randomized, controlled, clinical effectiveness trial. Ann Intern Med. 2002;137(2):87-95. [PubMed: 12118963]
16.
Stacey D, Légaré F, Col NF, et al. Decision aids for people facing health treatment or screening decisions. Cochrane Database Syst Rev. 2014;(1):CD001431. doi:10.1002/14651858.CD001431.pub4. [PubMed: 24470076] [CrossRef]
17.
Stacey D, Taljaard M, Dervin G, et al. Impact of patient decision aids on appropriate and timely access to hip or knee arthroplasty for osteoarthritis: a randomized controlled trial. Osteoarthritis Cartilage. 2016;24(1):99-107. [PubMed: 26254238]
18.
Bozic KJ, Belkora J, Chan V, et al. Shared decision making in patients with osteoarthritis of the hip and knee: results of a randomized controlled trial. J Bone Joint Surg. 2013;95(18):1633-1639. [PubMed: 24048550]
19.
de Achaval S FL, Volk RJ, Cox V, Suarez-Almazor ME. Impact of educational and patient decision aids on decisional conflict associated with total knee arthroplasty. Arthritis Care Res. 2012;64(2):229-237. [PMC free article: PMC3634330] [PubMed: 21954198]
20.
Stacey D, Hawker GA, Dervin G, et al. Decision aid for patients considering total knee arthroplasty with preference report for surgeons: a pilot randomized controlled trial. BMC Musculoskelet Disord. 2014;15(54):1-10. [PMC free article: PMC3937455] [PubMed: 24564877]
21.
Hip and knee osteoarthritis toolkit. Dartmouth-Hitchcock Center for Shared Decision Making website. -hitchcock.org/csdm_toolkits/hip_and_knee_osteoarthritis_toolkit.html. Published 2017. Accessed January 7, 2018.
22.
Selker HP, Ruthazer R, Terrin N, Griffith JL, Concannon T, Kent DM. Random treatment assignment using mathematical equipoise for comparative effectiveness trials. J Clin Transl Sci. 2011;4(1):10-16. [PMC free article: PMC3076795] [PubMed: 21348950]
23.
Forsythe LP, Ellis LE, Edmundson L, et al. Patient and stakeholder engagement in the PCORI pilot projects: description and lessons learned. Int J Gen Med. 2016;31(1):13-21. [PMC free article: PMC4700002] [PubMed: 26160480]
24.
Deverka PA, Lavallee DC, Desai PJ, et al. Stakeholder participation in comparative effectiveness research: defining a framework for effective engagement. J Comp Eff Res. 2012;1(2):181-194. [PMC free article: PMC3371639] [PubMed: 22707880]
25.
PCORI engagement rubric. PCORI website. -Rubric.pdf. Published 2014. Accessed October 27, 2016.
26.
Concannon TW, Meissner P, Grunbaum JA, et al. A new taxonomy for stakeholder engagement in patient-centered outcomes research. Int J Gen Med. 2012;27(8):985-991. [PMC free article: PMC3403141] [PubMed: 22528615]
27.
Multicenter Osteoarthritis Study (MOST) database. San Francisco, CA: University of California; 2009. http://most​.ucsf.edu. Accessed May 15, 2014.
28.
Osteoarthritis Initiative (OAI) database. Bethesda, MD: National Institutes of Health; 2013. https://nda​.nih.gov/oai/. Specific data sets: V 0.2.2, 1.2.1, 2.2.2, 3.2.1, 4.2.1, 5.2.1, 6.2.2, 7.2.1, 8.2.1, 9.2.1, 24, 25, and 9. Accessed June 25, 2014.
29.
Hawker GA, Wright JG, Coyte PC, et al. Differences between men and women in the rate of use of hip and knee arthroplasty. New Engl J Med. 2000;342(14):1016-1022. [PubMed: 10749964]
30.
Hawker GA, Wright JG, Coyte PC, et al. Determining the need for hip and knee arthroplasty: the role of clinical severity and patients' preferences. Med Care. 2001;39(3):206-216. [PubMed: 11242316]
31.
Hawker GA, Wright JG, Glazier RH, et al. The effect of education and income on need and willingness to undergo total joint arthroplasty. Arthritis Rheum. 2002;46(12):3331-3339. [PubMed: 12483740]
32.
New England Baptist Hospital (NEBH) orthopedic surgery registry. Boston, MA: New England Baptist Hospital; 2018. https://www​.nebh.org​/health-professionals​/research/orthopedic-registry/. Accessed February 22, 2017.
33.
Tufts Medical Center (TMC) orthopedic surgery registry. Boston, MA: Tufts University School of Medicine.
34.
WOMAC osteoarthritis index. http://www​.womac.org/womac/index.htm. Accessed February 22, 2017.
35.
SF-12 health survey. http://www.outcomes-trust.org/instruments.htm#SF-12. Accessed February 22, 2017.
36.
Ware J Jr, Kosinski M, Keller SD. A 12-Item Short-form health survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34(3):220-233. [PubMed: 8628042]
37.
Lacson E Jr, Xu J, Lin SF, Dean SG, Lazarus JM, Hakim RM. A comparison of SF-36 and SF-12 composite scores and subsequent hospitalization and mortality risks in long-term dialysis patients. Clin J Am Soc Nephrol. 2010;5(2):252-260. [PMC free article: PMC2827595] [PubMed: 20019120]
38.
Kosanke JB, Bergstralh E. GMATCH macro for greedy matching. http:​//bioinformaticstools​.mayo.edu/research/gmatch/. Accessed February 22, 2017.
39.
Riddle DL, Kong X, Jiranek WA. Factors associated with rapid progression to knee arthroplasty: complete analysis of three-year data from the osteoarthritis initiative. Joint Bone Spine. 2012;79(3):298-303. [PubMed: 21727020]
40.
Gantz MG. Creating RTF tables with univariate analyses of multiply imputed data. Poster presented at: Southeast SAS Users Group (SESUG) Conference; October 8-10, 2006; Atlanta, GA.
41.
SAS (for Windows) [computer program]. Version 9.4 TS Level 1M2. Cary, NC: SAS Institute; 2002-2012.
42.
Stuart EA. Matching methods for causal inference: a review and a look forward. Stat Sci. 2010;25(1):1-21. [PMC free article: PMC2943670] [PubMed: 20871802]
43.
Stacey D, Légaré F, Lewis K, et al. Decision aids for people facing health treatment or screening decisions. Cochrane Database Syst Rev. 2017;4:CD001431. doi:10.1002/14651858.CD001431.pub5. [PMC free article: PMC6478132] [PubMed: 28402085] [CrossRef]
44.
HCUPnet. Healthcare Cost and Utilization Project. US Department of Health & Human Services/Agency for Healthcare Research and Quality. https://hcupnet​.ahrq.gov/#setup. Published 2014. Accessed April 25, 2017.
45.
Selker HP, Griffith JL, Patil S, Long WJ, D'Agostino RB. A comparison of performance of mathematical predictive methods for medical diagnosis: identifying acute cardiac ischemia among emergency department patients. J Investig Med. 1995;43(5):468-476. [PubMed: 8528758]
46.
Brazier JE, Harper R, Munro J, Walters SJ, Snaith ML. Generic and condition-specific outcome measures for people with osteoarthritis of the knee. Rheumatology. 1999;38(9):870-877. [PubMed: 10515649]
47.
Bombardier C, Melfi CA, Paul J, et al. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care. 1995;33(suppl 4):AS131-AS144. [PubMed: 7723441]
48.
Binkley JM, Stratford PW, Lott SA, Riddle DL. The lower extremity functional scale (LEFS): scale development, measurement properties, and clinical application. North American Orthopaedic Rehabilitation Research Network. Phys Ther. 1999;79(4):371-383. [PubMed: 10201543]
49.
Patrick DL, Deyo RA. Generic and disease-specific measures in assessing health status and quality of life. Med Care. 1989;27(suppl 3):S217-S232. [PubMed: 2646490]
50.
Apache.org Tomcat [computer program]. Version 8. Wakefield, MA: Apache Software Foundation; 1999-2019.

Acknowledgments

The authors wish to thank our patient and clinician stakeholders for their valuable contributions and guidance: Debra Band-Entrup, Kathie Bernstein, Melvin Bernstein, Jaclyn Chu, Deane Felter, William Harvey, Helen Herzer, Cristina MacDonald, Vincent MacDonald, Susan Nesci, John Richmond, Kimberly Schelling, Eric Smith, and Steven Vlad. The authors thank Kaila Dion and Rajeev Chorghade for support with scale development and data management, and Ben Hannon for user interface design. We acknowledge Brendan Harrison, Nikolai Klebanov, and Esha Sondhi for data collection, and Mary Pevear and Gary Schneider for their assistance with Orthopedic Surgery Registries.

The OAI is a public-private partnership comprising 5 contracts (N01-AR-2-2258, N01-AR-2-2259, N01-AR-2-2260, N01-AR-2-2261, and N01-AR-2-2262) funded by the NIH, a branch of the Department of Health and Human Services, and conducted by the OAI study investigators. Private funding partners include Merck Research Laboratories, Novartis Pharmaceuticals Corporation, GlaxoSmithKline, and Pfizer Inc. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health. The manuscript was prepared using an OAI public use data set and does not necessarily reflect the opinions or views of the OAI investigators, the NIH, or the private funding partners.

MOST comprises 4 cooperative grants (Felson—AG18820, Torner—AG18832, Lewis—AG18947, and Nevitt—AG19069) funded by the National Institutes of Health, a branch of the Department of Health and Human Services, and conducted by MOST study investigators. This manuscript was prepared using MOST data and does not necessarily reflect the opinions or views of MOST investigators. Recommended additional documentation describing various aspects of the design and methods of MOST is available by request sent to ude.fscu.gsp@enilnOTSOM and should be paraphrased and referenced as appropriate.

Data were provided from the Ontario Hip and Knee Osteoarthritis Cohort conducted by the Canadian Osteoarthritis Research Program, led by Dr Gillian Hawker. Data provided from CORP are made possible through grants by the Canadian Institutes of Health Research and the Arthritis Society.

Research reported in this report was funded through a PCORI Award (ME-1306-02327). The views, statements, and opinions presented in this report are solely the responsibility of the authors and do not necessarily represent the views of PCORI, its Board of Governors, or its Methodology Committee.

Research reported in this report was [partially] funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (#ME-1306-02327) Further information available at: https://www.pcori.org/research-results/2013/developing-software-predict-patient-responses-knee-osteoarthritis-treatments

Appendices

Appendix A.

Matching (PDF, 113K)

Appendix B.

Missing Data (PDF, 105K)

Appendix E.

User Interface Testing (PDF, 263K)

Original Project Title: A Method for Patient-Centered Enrollment in Comparative Effectiveness Trials: Mathematical Equipoise
PCORI ID: ME-1306-02327

Suggested citation:

Selker HP, Daudelin DH, Ruthazer R, et al. 2019. Developing Software to Predict Patient Responses to Knee Osteoarthritis Treatments and to Identify Patients for Possible Enrollment in Randomized Controlled Trials. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/9.2019.ME.130602327

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

Copyright © 2019. Tufts Medical Center. All Rights Reserved.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK600210PMID: 38346132DOI: 10.25302/9.2019.ME.130602327

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (4.3M)

Other titles in this collection

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...