U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of Adding Disease-Specific Concerns to Patient-Reported Outcome Measures

Adding Disease-Specific Concerns to Patient-Reported Outcome Measures

, PhD, MPH, , MS, , PhD, MPH, , PhD, , PhD, , PhD, and , PhD, MPH.

Author Information and Affiliations

Structured Abstract

Background:

Patient-reported outcomes, which are generic or condition-specific, are used for several reasons, including clinical care, clinical research and trials, and in national efforts to monitor the quality of health care delivery. Creating patient-reported outcome measures (PROMs) that meet different purposes without overburdening patients, health care systems, providers, and data systems is paramount.

Objectives:

The overall aim of this study was to create a methodology to incorporate condition-specific concerns in generic health assessments. We used PROMIS® as our resource for generic measures and started with 2 prevalent and high-impact conditions: heart failure (HF) and osteoarthritis of the knee (OA-K). The 3 primary objectives to reach this aim were the following:

  • Objective 1: Obtain patient and clinician expert input on the condition-specific relevance of existing and new items to develop improved brief assessments (PROMIS-PLUS) of these 2 high-impact conditions.
  • Objective 2: Validate PROMIS-PLUS in these 2 conditions to document key measurement properties of reliability, validity, patient-perceived relevance, and clinical usability.
  • Objective 3: Produce crosswalks from PROMIS to PROMIS-PLUS and selected legacy instruments for these 2 conditions and disseminate the new instruments and crosswalks.

Methods:

We used a longitudinal transformation mixed-methods study design at 13 clinical sites to develop and test the instruments. The design built on PROMIS methods and the International Society for Pharmacoeconomics and Outcomes Research's Task Force recommendations for developing PROM items. Specific methods included patient focus groups, clinician interviews, item gap analysis, new item development, cognitive interviews, draft instruments development, draft instruments testing, data analysis, and crosswalks between legacy measures and the new measures.

Results:

For HF, we identified 64 existing PROMIS items and developed 22 new items, across 18 domains. For OA-K, we identified 52 existing PROMIS items and developed 24 new items, across 14 domains. Our psychometric analyses evaluated measure reliability (internal consistency and test-retest estimates) and measure validity (convergent, divergent, and concurrent correlational analyses; and known-groups analyses, based on high vs low Global Health Physical and Mental status); these all supported measure usefulness. Our assessments of change in domain status across time (paired t tests of baseline vs follow-up status and change score comparisons of meaningful vs no/not meaningful Global Rating of Change Score status) supported the instruments' sensitivity to measuring change.

Conclusions:

We developed a methodology for creating condition-specific instruments in 2 high-impact conditions that bridge gaps in existing generic measures. The methodology creates instruments that gather the patient's perspective while allowing health systems, researchers, and other interested parties to monitor and compare outcomes over time, conditions, and populations.

Limitations:

We noted 4 primary limitations: (1) Focus group participants were not representative of the diversity of education, racial, or ethnic backgrounds in the United States; we were unable to account for nonrespondents in the cross-sectional panel; (3) the final instruments may be too long for ongoing use in clinical settings; and (4) we retained less than half of respondents with HF in the longitudinal study owing to their disease burden and our limited access to patients.

Background

Calls continue to mount to establish common standards for measures of quality of health care delivery, including the incorporation of patients' health status and values via patient-reported outcomes (PROs) that “matter to patients.”1,2 Although investigators have used PROs for many years in clinical trials and some registries,3 their use is growing in clinical care settings for patient care and shared decision-making1 and for managing population health at a national level.3-7 As 1 example, the Office of the National Coordinator for Health Information Technology Meaningful Use Stage 2 rule called for providers to measure the functional status of patients undergoing total knee arthroplasty.8 As such, creating patient-reported outcome measures (PROMs) that meet different purposes in varying contexts without overburdening patients, health care systems, providers, and data systems is an important goal; achieving it can help improve care and the alignment of care with patients' goals.

When considering current PROMs, experts have 2 basic approaches to assess patient health: generic and condition specific.9 Generic measures allow for comparability “across patients and populations” whereas condition- or disease-specific measures tend to provide improved relevance and responsiveness, including the potential to better differentiate groups by clinically salient symptoms or concerns.6,9,10 Measures are identified as being condition specific if they include items relevant for patients with the specific clinical condition and, for some measures, actually specify the condition (eg, “Do you have trouble walking because of osteoarthritis of the knee?”).11 Although generic- and condition-specific measures may be more or less relevant depending on the purpose, the potential of providing “the most complete assessment of patient-reported health”12 for uses ranging from direct patient care to “driving improvement, outcomes measurement”2 increases if both types of measures are included.

Efforts to create comprehensive instruments focus primarily on identifying2,13 and combining existing measures.12,14,15 Some generic measures have used condition- or treatment-specific modules or subscales that can be added to core items to enhance the relevance of the measure to a specific population.16-18 PROMIS®, which provides item banks of generic measures that are applicable across populations and chronic conditions,19-21 serves as a prominent potential resource in these efforts. At the same time, PROMIS has focused on achieving more precise measures and efficiencies for individuals by using item response theory (IRT) for the development of computerized adaptive testing administration.22-24 A condition-specific assessment that builds on the comprehensive PROMIS generic framework may produce PROMs that offer both increased relevance for capturing the health burden imposed by specific problems and the ability to compare across diseases, conditions, populations, and systems.

Our overall aim in this study was to create a generalizable method to incorporate condition-specific concerns important to patients into generic health assessments, starting with 2 prevalent, high-impact conditions: heart failure (HF) and osteoarthritis of the knee (OA-K). Our 3 primary objectives to reach this aim were the following:

  • Objective 1: Obtain patient and clinician expert input on the condition-specific relevance of existing and new items to develop improved brief assessments (PROMIS-PLUS) of these 2 high-impact conditions.
  • Objective 2: Validate PROMIS-PLUS in these 2 conditions to document key measurement properties of reliability, validity, patient-perceived relevance, and clinical usability.
  • Objective 3: Produce crosswalks from PROMIS to PROMIS-PLUS and selected legacy instruments for these 2 conditions and disseminate the instruments and crosswalks.

We limited our focus to the 2 conditions, HF and OA-K, to ensure the feasibility of the study and to understand how the method would work in 2 high-impact yet different conditions. Both HF and OA-K affect millions of people in the United States, with HF at approximately 5.8 million people and OA-K at more than 9 million. HF alone is responsible for more than 12 million office visits and 6.5 million hospital days each year,25-27 while the number of OA-K patients undergoing total knee arthroplasty increased 162% between 1991 and 2010.28,29 The number of individuals with HF and OA-K is expected to increase even more because of the aging of the US population.28-30

While HF and OA-K are similar in some ways, the level of complexity for patients and their caregivers may differ in important ways, which may influence the use of PROMs. For instance, people with HF often have several other comorbid and major chronic conditions, such as chronic obstructive pulmonary disease, depression, diabetes, dementia, and hyperlipidemia.31-34 The multiplicity of disease states and symptoms that these patients typically face creates major challenges related to prioritization of treatment goals in the face of multiple treatment alternatives and self-management. Although evidence is sparse, the tracking of PROs could facilitate self-management and treatments targeted to the patient's goals.35

Patients living with OA-K are faced with important decisions as well, but these may be more limited than for patients with HF. In the case of OA-K, patients' primary decisions relate to whether or when to have their condition treated medically (using medications, physical therapy, and weight loss) or surgically. To make a wise choice, patients and those close to them (eg, spouses) need accurate information on the risks, benefits, and outcomes of these 2 different treatment choices. Unfortunately, information on PROs are generally not available, and most Americans are unlikely to have access to state-of-the-art shared decision-making.36-39 Systematic data on meaningful PROs related to OA-K to guide decisions would be helpful.40

In the next sections, we describe the process we developed to ensure that we met our overall aim and 3 objectives of our study. The 2 primary study phases were an instrument development phase (phase 1, objective 1) and an instrument testing phase (phase 2, objectives 2 and 3). We begin, however, by describing how we engaged patient and clinical partner stakeholders, which was critical for the conduct of this study.

Participation of Patients and Other Stakeholders in the Design and Conduct of Research and Dissemination of Findings

A key goal of our study was to incorporate patients' and stakeholders' perspectives on the design and conduct of this study. We strove to achieve this goal in several ways in each phase of the study. Patients contributed to this research through participation in the Patient and Family Advisory Council (PFAC); stakeholders, including physicians, nurse practitioners, physician assistants, and clinic staff, provided input through consultation, collaboration, and interviews.

Patient Engagement

With input from patient advisors during the grant application process, we determined that convening a PFAC would be an ideal way to obtain ongoing input from these important stakeholders. We recruited 6 participants in partnership with Dartmouth-Hitchcock's Center for Patient and Family Centered Care and the vice president for quality at Northwestern Memorial Hospital. The individuals we selected for the council either had personal experience with or supported someone with 1 of the 2 target conditions, and had experience working as patient advocates.

PFAC members participated in monthly meetings throughout the study planning and instrument development phases of the project. We consulted them periodically during data collection of the instrument testing phase. PFAC participants provided important guidance on the recruitment materials and participant incentives. They made suggestions on the approach for recruiting patient participants, edited drafts of recruitment documents, and provided guidance on recruitment protocols and incentives for participants. They reviewed data from focus groups and provided input on instrument development.

In terms of dissemination, we sent PFAC members drafts of our presentations and manuscripts but did not obtain much input in the process. We anticipate that future manuscripts focusing on each condition more fully and goal setting by patients in the context of PROs may be of more interest for them to assist us.

Stakeholder Engagement

During study development and as needed throughout study implementation, 3 physicians (1 orthopedic surgeon and 2 cardiologists) served as advisors to the research team. We consulted 2 cardiologists because the treatment and trajectory of HF is more varied than that of OA-K, which led to more complex decision-making regarding patient eligibility. The clinical advisors led decision-making in eligibility and exclusion criteria, patient identification, recruitment protocols, data collection timing, and clinical data to be collected; they also answered specific questions about patient eligibility.

During instrument testing, the site principal investigator (PI)—an orthopedic surgeon or a cardiologist and clinic staff at each of the 9 participating clinics provided substantial guidance on developing the protocols for identifying, recruiting, and retaining eligible patients in the longitudinal sample. Clinic staff (site PIs, research directors, coordinators, and assistants) participated in discussion sessions to share lessons learned and what was working well, discuss challenges or questions, and collaboratively problem solve recruitment and retention issues. They brought forth many creative ideas that improved recruitment and retention, including the timing and frequency of follow-up, phrasing of the outreach and follow-up scripts, and changes to the patient tracking database. We recruited an outside interviewer to interview site PIs and research coordinators (N = 17) at the end of the study to obtain feedback on the clinical relevance and usability of the PROMIS-PLUS. We used their feedback to inform the dissemination of findings particularly through presentations and publications. We have and will continue to engage these stakeholders in future publications as advisors and coauthors.

Methods

Overview

The overall aim of our study was to create a methodology to incorporate condition-specific concerns in generic health assessments, using HF and OA-K as our test conditions. Our 3 objectives to reach this aim required the use of mixed methods to obtain input for instrument creation (instrument development phase) and to examine the measurement properties of the resulting instruments (instrument testing phase). In this section, we describe the overall research design, including data sources, population samples, and analytical approaches, that we used to achieve our overall aim.

Research Design

We used a longitudinal transformation mixed-methods design,41 in which multiple qualitative and quantitative data sources are collected, analyzed, and integrated throughout the study in an iterative manner, to develop and test the instruments. Specifically, in this study, we created and “transformed” the instruments over time based on our findings throughout the process. We built our design based on PROMIS methods and the International Society for Pharmacoeconomics and Outcomes Research Task Force recommendations for developing PROM items.19,42-44 We partnered with 13 clinics at 9 locations across the United States to conduct the study.

Table 1 describes the key study phases and steps: In phase 1 (objective 1), we focused on instrument development, including qualitative data collection, analysis, an item gap analysis, new item development, and cognitive testing of the resulting draft instruments.45 In phase 2 (objectives 2 and 3), we tested the instruments with 2 samples in each condition: a cross-sectional sample and a longitudinal sample. The study protocol was registered with the National Information Center on Health Services Research and Health Care Technology (ID: HSRP 20143587).

Table 1. Overview of Study Phases, Steps, and Protocol.

Table 1

Overview of Study Phases, Steps, and Protocol.

Our specific methods included patient focus groups, clinician interviews, item gap analysis, new item development, cognitive interviews, draft instrument development, draft instrument testing, data analysis, and crosswalks between legacy measures and the new measures. We chose these methods based on guidelines for developing PROMs, the need to account for existing PROM items in deciding whether a new item was warranted, and our assumptions that new items would need to be developed.

Data Sources and Data Sets

In this section, we describe the study setting and data sources for instrument development and testing phases. We used only primary data for this study.

Study Setting

We partnered with 13 clinics (see Table 2) to identify and recruit eligible patients for the focus groups and providers for interviews (phase 1) and/or to identify and recruit eligible patients for the longitudinal survey testing (phase 2). We recruited clinic partners from across the 4 US Census Bureau regions to reach diverse populations and sufficient numbers of patients.

Table 2. Participating Clinical Sites, by Target Condition and Study Phase.

Table 2

Participating Clinical Sites, by Target Condition and Study Phase.

Phase 1

Patient focus groups

To identify domains that are important to patients, and subsequently identify gaps in coverage from existing instruments, we conducted 8 focus groups for each condition at 4 partner sites (16 groups in total). Study coordinators at the partner sites identified potential participants via medical record review or physician recommendation. Our inclusion criteria for HF included being under the care of a primary care provider or cardiologist for HF, at least 18 years old, English speaking, and not having a left ventricular assist device or heart transplant. Our inclusion criteria for OA-K included being diagnosed with OA-K, being a candidate for elective total joint arthroplasty (TJA) or having undergone TJA within the previous 12 months, at least 18 years old, and English speaking. Our exclusion criteria for both samples included patients with dementia, patients with a serious mental disorder (eg, schizophrenia), or patients residing in a nursing home or extended care facility. A study coordinator contacted potential patient participants either via phone or in clinic by. Participants received a $40 gift card in appreciation for their time. We recorded and transcribed focus groups for qualitative analysis.

Physician interviews

To obtain provider perspectives on gaps in coverage from existing instruments and to identify domains of importance, we conducted phone interviews with physicians, nurse practitioners, and physician assistants (N = 6 specializing in OA-K, and N = 10 specializing in HF) at clinics where focus group participants were recruited. We recorded and transcribed interviews for qualitative analysis.

Cognitive interviews

To pretest the questions we developed, we conducted structured cognitive interviews by telephone with a subset of the focus group population (N = 10 with OA-K, N = 10 with HF). We recruited participants by phone and gave them the choice of a $30 check or gift card in appreciation for their time. We recorded responses in a formatted spreadsheet for analysis.

Phase 2

We tested the instruments with 2 study samples for each condition: a cross-sectional sample, identified and recruited through an online panel recruitment company, and a longitudinal sample, identified and recruited through participating hospitals and clinics. Inclusion and exclusion criteria for both samples are described in Table 3.

Table 3. Inclusion and Exclusion Criteria by Condition and Sample.

Table 3

Inclusion and Exclusion Criteria by Condition and Sample.

Cross-sectional sample

To access a larger volume of respondents (N = 600 per condition) living with the target conditions, we recruited participants to complete surveys at a single point in time through the online panel company Opinions4Good (Op4G, Portsmouth, NH). We selected Op4G based on its large, diverse panel and its ability to administer the screening questions and survey instruments, enforce quotas, and administer a survey retest to a subsample of the respondents within a reasonable time frame (2 months). Op4G emailed potential participants and determined eligibility based on responses to a series of screening questions (see Table 3 for eligibility and exclusions). We specified quotas for age, sex, and race to ensure the representativeness of the respondents and disease-specific quotas for the respective conditions (eg, 4 levels of New York Heart Association classification for respondents with HF48 and status as pre-total knee arthroplasty [TKA] or post-TKA for respondents with OA-K). Op4G recruited participants on a rolling basis until each quota was satisfied. We included eligible respondents in the study and asked them to complete the instruments. To allow us to assess test-retest reliability, we recruited a subset of the panel participants (N = 100 per condition) to complete a retest 3 to 7 days after completing the survey. We invited all respondents to participate and did not enforce quotas in the retest group. We closed the retest when the first 100 people (per condition) responded to the pretest.

For the cross-sectional sample, our ability to track nonrespondents and their reasons for declining to participate in the cross-sectional sample was limited because participant eligibility was determined through their response to screening questions. In addition, because of the nature of panel sample participation, nonrespondents were not asked or expected to provide reasons for declining to participate in the initial screening.

Longitudinal sample

The purpose of longitudinal instrument testing was to test whether the instruments were able to detect real changes in patients' experience of health. To that end, we administered surveys to patients before and after receiving treatment for the target conditions. We worked with clinical experts (orthopedic surgeon and cardiologists) to develop eligibility and exclusion criteria, recruitment protocols based on specific treatments, and survey administration timelines (see Tables 3 and 4). We also consulted with the PFAC to develop the recruitment protocols and materials. We targeted the recruitment of patients whose health status was anticipated to improve between baseline and follow-up because of treatment (Table 3).

Table 4. Survey Eligibility Windows for Longitudinal Sample, by Treatment Type and Time Point.

Table 4

Survey Eligibility Windows for Longitudinal Sample, by Treatment Type and Time Point.

We relied on clinical research coordinators (nurses and/or affiliated research staff) at participating clinics and hospitals to identify and recruit participants in the longitudinal sample. They used medical records and surgical lists to identify eligible patients, and recruited participants in person or by phone. The clinical research coordinators had access to patient medical records and procedure scheduling charts to determine patient eligibility, so no screening questions were necessary to identify eligible patients.

We required that all participants be 18 years or older and able to speak English; we excluded those with dementia or a serious mental disorder. We describe the full list of criteria and exclusions in Table 3. Initial total knee replacement was a single qualifying intervention for patients with OA-K. Clinic staff identified eligible patients primarily through scheduled surgeries. HF, a more complex condition, is treated in many different ways at different stages of the disease. As a result, clinic staff identified eligible patients based on their receipt of 1 of 5 different treatments for HF (see Table 3).

Participants in the longitudinal sample were identified and recruited by clinic staff serving as research coordinators at participating sites. We tailored recruitment protocols for each clinic and treatment pathway, but typically recruitment involved an initial contact with eligible patients during a scheduled appointment or hospital stay, with follow-up by phone, email, or mailed letter. Patients were allowed 2 options for completing surveys electronically at baseline and follow-up: at home via an emailed link or in clinic on an iPad provided by the study. To boost the retention of patients with HF, we offered paper surveys that could be completed at home and returned by mail.

We gave patients recruited for the longitudinal survey $20 for completing the baseline (pretreatment) survey and $30 for completing the follow-up (posttreatment) survey. Patients were offered the choice of gift card or check. We considered surveys to be complete if at least 80% of questions were answered.

Research coordinators at participating sites tracked the reasons that participants declined to participate, including timing (related to their health condition), lack of access to internet at home, the time commitment required by the study, physical ailments that preclude participation (visual impairment, hand/wrist ailments), and lack of interest. Additional individuals screened eligible but had their surgeries or procedures canceled, making them ineligible. Many people declined to participate without providing a reason; some agreed to participate (by telephone) but did not complete the enrollment (online). We took several steps to reduce attrition in the follow-up population, including increasing incentives from baseline to follow-up, issuing a paper follow-up survey to accommodate HF patients without in-clinic follow-up appointments, and facilitating training and networking across research coordinators at participating sites to improve participant retention.

We created eligibility windows to ensure that posttreatment surveys were administered in time intervals that allowed for sufficient recovery to take place—see Table 4 for specific eligibility timelines. We programmed the online survey administration tool to enforce the eligibility windows and cutoff dates for baseline and follow-up surveys. For paper surveys, clinic and research staff monitored and enforced eligibility and cutoff dates for the HF patients who chose to do paper surveys instead of the online survey.

Study Outcomes

Our primary study outcome was to create and test a generalizable methodology to incorporate condition-specific issues into generic, standardized PROM measures to ensure that issues of concern to patients were included. Our 3 primary objectives (re secondary outcomes were the following:

  1. Develop 2 disease-specific PROMs in a generic PROM, in this case PROMIS, to demonstrate the feasibility and success of the methodology.
  2. Compare measurement quality against current gold-standard legacy disease-specific measures, the Kansas City Cardiomyopathy Questionnaire (KCCQ)49 for HF and the Knee Injury and Osteoarthritis Outcome Score (KOOS)50 for OA-K.
  3. Develop crosswalks between legacy and PROMIS-PLUS measures.

Analytical and Statistical Approaches

Phase 1: Instrument Development (Steps 1-3 in Protocol)

Two of our research team members coded the focus group transcripts in Dedoose51 using a thematic analysis approach.52 They then reviewed the provider interview data to compare and contrast with focus group findings and identify any potential additional domains or items. As no new domains or items were identified, we included focus group data only in our subsequent analyses.

Once the coding was complete, 1 additional research team member reviewed the codes in collaboration with the original 2 researchers to group codes according to their breadth (ie, how much was the code/concept mentioned across focus groups) and depth (ie, how often was the code/concept discussed within and across focus groups). If a code was mentioned only in a few focus groups or was discussed only by a handful of participants, then it was not included for further consideration.

PROMIS item and domain selection

After we narrowed down the codes, we compared them to existing PROMIS domains and items and identified and selected existing PROMIS domains and items for inclusion. The team first matched qualitative codes to PROMIS domains and identified potential item matches within the domains. The team then reviewed selected items for subjective item quality, such as clarity, sex, or opportunity bias, and adequate specificity of the item. Next, the team assessed the psychometric quality of the selected PROMIS items, retaining best-quality items while excluding lower-quality items as a means of helping to reduce overall item counts within each domain. The team obtained each selected PROMIS item's parameters (ie, its slope and category thresholds), as estimated in the original domain item calibration testing53,54 and as currently deployed in Assessment Center (https://www.assessmentcenter.net/). The team then reviewed these item parameters and considered items with slopes (or item discrimination parameters) <2.0 to be of lower psychometric quality and, therefore, eliminated them, with the caveat that other selected PROMIS items remained that would serve to measure an eliminated item's content. The team evaluated the remaining items to ensure content coverage in each domain and reached consensus on the final items included for quantitative testing.

Gap item identification

We identified potential gap codes in the patient focus group data as codes that did not appear to be covered by existing PROMIS content. We reviewed the gap codes and considered code relevance to the key condition as well as content coverage by PROMIS or other legacy instruments. If we determined that the code was not covered by PROMIS, but was relevant and measurable via patient report, we drafted an item to measure the concept. We grouped these items within existing PROMIS domains or created new domains if needed. We then tested new items in cognitive interviews with patients using the cognitive interview protocol developed by Willis55 to ascertain the following: (1) comprehension; (2) the processes used by the respondent to retrieve relevant information from memory; (3) appropriateness of response options; (4) confidence in answering the question; and (5) relevance to the condition (HF or OA-K). The research team tracked, assessed, and discussed each of these components in making decisions to retain, revise, or drop each new item. We also conducted a readability assessment using Lexile Analyzer.56

Phase 2: Instrument Testing and Finalization (Steps 4-6)

We used data from field testing (cross-sectional and longitudinal samples) to obtain item calibrations; establish the reliability, validity, and usability of new measures; and create crosswalks to legacy instruments. We used cross-sectional data for initial psychometric analyses, including test-retest reliability, and linking studies; we used longitudinal data for validation and change assessment.

Cross-sectional data: initial psychometric analyses

We conducted classical item analyses57 on all measures and summarized results per measure (eg, number of measure items, internal consistency reliability, adjusted item-total correlations). We created raw summed scores for each measure and identified minimum and maximum possible scores per measure. We graphically displayed raw summed score distributions of measures to determine their nature (ie, normal vs skewed). We calculated Pearson and Spearman correlations (for skewed score distributions) between all score pairs and reviewed them for validity evidence.

We evaluated test-retest reliability with Pearson and Spearman correlations and with the intraclass correlation coefficient, using a subset of n = 100 cases from the cross-sectional data.

We conducted categorical confirmatory factor analysis (CCFA) to assess the dimensionality of each measure (using polychoric correlations, the Mplus weighted least square mean-variance adjusted estimator, and cases without missing responses). We ran a single factor model per measure, and reviewed overall model fit using published standards (eg, Comparative Fit Index [CFI] ≥0.95, Tucker-Lewis Index [TLI] ≥0.95, root mean square error of approximation [RMSEA] <0.06, and weighted root mean residual <1.00).58 For our CFI, TLI, and RMSEA fit indexes, we used each CCFA model's reported scaled Satorra-Bentler chi-square value in our fit index estimation. We conducted bifactor analyses to assess the potential impact of measure multidimensionality.

Using modern measurement theory approaches, we evaluated IRT modeling assumptions (eg, local independence, monotonicity, item fit) and assessed differential item functioning (DIF).60-62 Because of the need to incorporate new items into existing measures that had been developed from graded response model (GRM) estimation, we employed the GRM for all IRT-based analyses. We conducted DIF analyses for sex, age, and education level, where sufficient subgroup sample sizes (n = 200) existed. We studied DIF score impact using naive vs purified theta estimates60 derived from lordif analysis,62 comparing thetas by median standard error and effect size criteria.60,63 Our strategy was to retain items deemed important by patients and stakeholders. If items did not entirely fit the CCFA or IRT models evaluated, they were nevertheless considered for retention within a measure or as stand-alone items.

Cross-sectional data: linking studies

We conducted 2 types of linking studies: (1) linking studies to add new items to items in an existing measure on an established metric; and (2) linking studies to actually link scores that had been obtained from legacy measures to scores obtained from PROMIS-based measures. For the first type of linking study, we added new PROMIS items to existing PROMIS measures, calibrating the new items to be on the same metric as items from the existing PROMIS measures. For the second type of linking study, to facilitate comparisons between scores from legacy measures and PROMIS-based measures, we linked scores, from legacy to PROMIS, to establish a mathematical relationship between the measures' scores, thus allowing for crosswalk table construction.64,65

Although PROMIS measures may be preferred by some researchers and clinicians, legacy measures such as those of interest in this study (ie, KCCQ for HF, KOOS for OA-K) remain in use and continue to be featured in research projects and the existing literature. Therefore, using established psychometric techniques, we attempted to link scores between legacy and PROMIS measures assessing a common construct and create crosswalk tables to compare and relate scores from one measure to another. Common in education applications, linking is increasingly employed in health measurement.

Linking design and requirements

We administered legacy (KCCQ and KOOS) and PROMIS measures simultaneously to all study participants, thus achieving the highly robust linking study design known as the single-group design. We calculated Pearson correlations between all relevant pairs of raw summed scores; we gave particular attention to the correlations between legacy and PROMIS measures targeted for linking. Given the similarities of these legacy and PROMIS measures in terms of content and purpose, we anticipated that their score intercorrelations might be sufficiently high to justify either IRT-based or equipercentile linking. Correlations above 0.80 tend to support the appropriateness of IRT-based linking methods, while lower (but still at or above 0.70) correlations tend to support equipercentile linking; correlations of less than 0.70 indicate the 2 measures' scores may not be sufficiently correlated to realize successful score linkage.

Proposed linking methods

For our linking studies, we largely followed the linking methodology of the PROsetta Stone project.46,66 For our first linking method, an IRT-based approach, we focused on fixed parameter calibration, which maintains existing item parameters at previously estimated values and calibrates only new, to be added items.67 With this method, we fixed existing PROMIS item parameters at their originally estimated values and then calibrated only new items (ie, new PROMIS items or legacy items); thus, we placed all newly calibrated item parameters on the same metric as existing PROMIS anchor items. As a result, we were able to calibrate the parameters of new PROMIS items and legacy items (the nonanchor items) on the old or original measure scale (ie, the existing PROMIS metric). With fixed parameter calibration, there is 1 set of resultant item parameters: the previously estimated anchor items, supplemented by the newly calibrated items. For our second linking method, a non-IRT approach, we focused on equipercentile linking, which we considered as available for use if to-be-linked measure pairs did not meet IRT-based linking requirements (eg, did not meet IRT unidimensionality requirements, did not have a sufficiently high intermeasure score correlation). In such cases, we attempted to link measure scores based on their percentile rank.68,69 With equipercentile linking, to reduce sampling error, we first smoothed measure score frequency distributions using a polynomial log-linear method.70

Crosswalk tables

If the score linking relationship is robust, applying multiple linking methods within a planned linking study is likely to produce highly comparable results. Based on these selected state-of-the-art linking methodologies (ie, fixed parameter calibration linking and equipercentile linking), we constructed a crosswalk table for each pair of successfully linked legacy-to-PROMIS measures. Each crosswalk table shows how a legacy measure's summed scores are associated with the relevant PROMIS measure's T scores (which are, themselves, linear transformations of the IRT-based [theta] score). Because we emphasized standardization of scores on the PROMIS metric, we prepared crosswalk tables in 1 specific direction—ie, from a legacy measure to a PROMIS measure. Thus, successful crosswalks allow clinicians and researchers to convert their summed legacy measure scores into comparable scores on the PROMIS measure's T score metric.

Longitudinal data: validation and change analyses

We employed a similar set of psychometric analyses as we conducted with the cross-sectional data to assess finalized measures (eg, classical item analysis, CCFA). We assessed preliminary validity evidence using Pearson and Spearman correlations with legacy measures (convergent validity) and with PROMIS Global Health Physical and Mental Health summary scores (concurrent validity). We interpreted the magnitude of the absolute value of a Pearson r or Spearman rho between measure scores as follows: From 0.00 to <0.10 is a negligible correlation; from 0.10 to <0.30 is a small correlation; from 0.30 to <0.50 is a medium correlation; and from 0.50 to 1.00 is a large correlation.63,71 We used known-group comparisons (at baseline and 3 months) to evaluate scores by hypothesized differentiating variable (eg, high vs low Global Health Physical).

For our change analyses, we investigated within-person score changes (baseline vs 3 months paired t test) and compared change scores by low vs high assessment of self-reported change score groups (ie, high vs low global rating of change analysis of variance).

Handling missing data

We collected data primarily electronically to minimize missing data and allow for real-time tracking to identify potential issues. We programmed the survey as complete-to-proceed to encourage response. We employed a combination of strategies to address issues of missingness related to loss to follow-up and nonresponse. For our item score-based analyses (eg, internal consistency reliability, dimensionality, DIF studies), we excluded cases with missing item responses on an analysis-by-analysis basis. For our total score-based analyses (eg, known-groups validity analyses, longitudinal change score analyses), we included cases with missing item responses if that missingness was because of measure-allowed “not applicable” responses and if a total score could be computed based on an established scoring algorithm that incorporated the not applicable responses (ie, a minimum required number of responses per measure had been provided per individual). In addition, IRT-based scores can be calculated even when some data are missing.

Conduct of the Study

Table 1 outlines the final study protocol that we used in this study. Overall, we made no substantive changes to the original proposed protocol, except in step 4. Originally, the test populations were to be drawn from clinical sites only. However, we soon realized that we would not be able to recruit the target number of patients for calibration in clinical sites with the time and resources provided—particularly for the HF population. We determined that recruiting a cross-sectional sample through an online panel company could be done in combination with recruiting a longitudinal sample through hospitals and clinics. This ensured that we had enough responses for item calibration within the allotted timeline and budget.

This study was approved by the IRBs of Dartmouth College, Baylor Scott & White Research Institute, Maine Medical Center, Mayo Clinic, Northwestern University, Oregon Health and Science University, and The University of Pennsylvania. The other participating clinics and hospitals established IRB Authorization Agreements with Dartmouth College. A full list of participating sites is listed in Table 3.

Results

The overall aim of this study was to create a methodology to incorporate condition-specific concerns important to patients in generic health assessments using PROMIS with HF and OA-K. We achieved this aim, and in this section we present our results related to the 3 primary objectives and 6 steps outlined in Table 1.

Participant Characteristics

Overall participant characteristics are described in Table 5. We conducted focus groups for both conditions from June to August 2014. We collected cross-sectional surveys from June to July 2015. We collected longitudinal surveys (baseline and 3-month follow-up) from July 2015 to March 2017 for patients with HF, and from July 2015 to November 2016 for patients with OA-K. We describe participant flow of the longitudinal sample in Figure 1.

Table 5. Participant Characteristics, by Sample at Baseline.

Table 5

Participant Characteristics, by Sample at Baseline.

Figure 1. Participant Flow, Longitudinal Sample.

Figure 1

Participant Flow, Longitudinal Sample.

Objective 1 (Steps 1-3): Input on Condition-Specific Items

We had 129 patients participating across 16 focus groups and 20 cognitive interviews. See Table 5 for detailed information on recruitment and demographics of patient participants per condition. Sixteen providers participated in the semistructured phone interviews.

Figures 2 and 3 illustrate the whole gap analysis process and results for each condition, including the numbers of existing PROMIS domains and items, gaps we identified, new items we developed, and the final draft PROMIS-PLUS instrument domains and associated number of items that remained after cognitive testing.45

Figure 2. Gap Analysis for Heart Failure.

Figure 2

Gap Analysis for Heart Failure.

Figure 3. Gap Analysis for Osteoarthritis of the Knee.

Figure 3

Gap Analysis for Osteoarthritis of the Knee.

To briefly summarize, for HF, we identified 64 items in 10 domains from existing PROMIS item banks and developed 32 new items in 8 domains. For OA-K, we identified 52 items in 8 domains from existing PROMIS item banks and developed 30 new items in 8 domains. We then conducted cognitive testing with 10 individuals for each condition on all the new items.

After analyzing the cognitive interviews, we retained 22 gap items and 64 existing PROMIS items in 18 domains in HF for testing72 and 24 gap items and 52 existing PROMIS items in 14 domains in OA-K.45 Table 6 provides counts by domain and condition for existing and new item counts. Appendices A and B give a detailed listing of all included domains and items.

Table 6. Existing and New Item Counts, by Domain and Condition.

Table 6

Existing and New Item Counts, by Domain and Condition.

Our readability assessment showed that the new items were interpretable at a fourth-grade level; existing PROMIS items aim to be at a sixth-grade or lower level. This provided us with some assurance that the instruments could be used for populations with low literacy.

Objectives 2 and 3 (Steps 4-6): Validate PROMIS-PLUS and Produce Crosswalks With Legacy Instruments

Measure Testing and Finalization

Our cross-sectional sample contained 600 HF patients and 600 OA-K patients. Our longitudinal sample contained 185 HF patients (with 78 providing follow-up responses) and 311 OA-K patients (with 238 providing follow-up responses). We used the field-testing response data from these samples for all our study's psychometric and linking analyses. Our overall strategy was to retain items that had been identified as being important by both patients and stakeholders. Nevertheless, we considered items not fully fitting classical item analyses standards or the CCFA or IRT models evaluated for retention within a measure or as stand-alone items.

Cross-sectional data: Initial psychometric analyses (objective 2, step 4)

We used our cross-sectional sample data for our initial psychometric analyses, including test-retest reliability. Our 18 HF domain measures ranged in number of items from 1 to 11 and included 22 new items; our 14 OA-K domain measures ranged in number of items from 1 to 13 and included 24 new items (see Table 6). The internal consistency reliability (Cronbach α) of the HF measures ranged from .52 (Life Satisfaction) to .94 (Depression); 12 of the HF measures had α values of ≥.70 (see Table 7). For the OA-K measures, the internal consistency reliability ranged from .67 (Symptoms) to .95 (Satisfaction With Social Roles and Activities [SRA], Pain Interference); 10 of the OA-K measure α values were ≥.70 (see Table 8).

Table 7. Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Heart Failure Cross-sectional Sample.

Table 7

Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Heart Failure Cross-sectional Sample.

Table 8. Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Osteoarthritis of the Knee Cross-sectional Sample.

Table 8

Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Osteoarthritis of the Knee Cross-sectional Sample.

If a measure's internal consistency reliability estimate is ≥.70, scores from that measure are widely considered as having adequate reliability for conducting score-based group comparisons; if a measure's internal consistency reliability estimate is ≥.90, scores from that measure are then widely considered as having adequate reliability for conducting score-based individual comparisons.

The average interitem correlation for the HF measures ranged from 0.36 (Life Satisfaction) to 0.71 (Depression); for the OA-K measures, the average interitem correlation ranged from 0.42 (Independence) to 0.81 (Social Isolation; see Tables 7 and 8). Minimum and maximum adjusted item-total correlations are also presented in Tables 7 and 8 for HF and OA-K measures.

We calculated raw summed scores for each HF and OA-K measure and determined measure means, standard deviations, medians, minimums, maximums, and score distribution skewness and kurtosis. Results are presented per measure in Table 9 (HF measures) and Table 10 (OA-K measures). In these tables, “Missing” refers to the number of study participants without a particular measure's score. Although participants were encouraged to respond to all questionnaire items, they did have the option to skip an item or items they felt to be not applicable to them or to which they simply did not want to provide a response. The observed minimum and maximum scores are the lowest and highest scores observed in our sample (as opposed to the theoretical minimum and maximum scores, which are the lowest and highest scores possible to attain with a measure). For skewness, which is a measure of a distribution's asymmetry, we considered values <−1.0 or >+1.0 as indicating a substantially nonsymmetric distribution (a normal distribution has skewness = 0). For kurtosis, which is a measure of a distribution's tailedness, we considered excess kurtosis values <−1.0 as indicating a platykurtic distribution and values >+1.0 as indicating a leptokurtic distribution (a normal or mesokurtic distribution has excess kurtosis = 0).73

Table 9. Measure Statistics, Heart Failure Cross-sectional Sample.

Table 9

Measure Statistics, Heart Failure Cross-sectional Sample.

Table 10. Measure Statistics, Osteoarthritis of the Knee Cross-sectional Sample.

Table 10

Measure Statistics, Osteoarthritis of the Knee Cross-sectional Sample.

As an example, for the HF Anxiety measure, all 600 study participants provided responses to each of the 5 Anxiety items; no participants had missing Anxiety scores. The mean (SD) Anxiety summed score was 13.6 (4.4), while its median score was 14.0. Observed summed scores ranged from 5 to 25, corresponding to the range of theoretical Anxiety scores possible (ie, minimum = 5; maximum = 25). The Anxiety score distribution appears largely normal, with a minimal positive skew (0.15) and a minimal excess kurtosis (−0.21).

We then graphically displayed each HF and OA-K measure raw summed score distribution via histogram to help determine, in conjunction with their previously estimated skewness and kurtosis statistics, the nature (ie, normal vs skewed) of each distribution. Histograms of HF measure score distributions are presented in Appendix C; those for OA-K measure score distributions are presented in Appendix D. The score distributions of the HF and OA-K measures were approximately normal, although we observed some evidence of slight skewness and kurtosis (see Tables 9 and 10 and Appendices C and D).

Test-retest reliability

We used a subset of 100 HF patients and 100 OA-K patients from our cross-sectional sample to evaluate test-retest reliability. We estimated both Pearson r and Spearman rho correlations for comparative purposes and to account for potential measure score distribution nonnormality. We also estimated 2 forms of the intraclass correlation coefficient (ICC), 1 including both systematic and random error in the estimation denominator and 1 including only random error. We used a 2-way mixed effects ICC model, in which we randomized people effects and fixed measure effects. Finally, we calculated measure-specific standard errors of measurement (SEMs), based on our ICC results from systematic plus random error-based estimations.

For the HF measures, Pearson r ranged from 0.81 (Independence) to 0.98 (Satisfaction With SRA). Only 1 HF measure (Independence) had an r less than 0.90. For the OA-K measures, Pearson r ranged from 0.88 (Life Satisfaction, Independence) to 0.98 (Satisfaction With SRA, Pain Interference). Two OA-K measures (Life Satisfaction, Independence) had r values less than 0.90. With both the HF and OA-K measures, Pearson r and Spearman rho estimated correlations exhibited only minor differences (see Tables 11 and 12).

Table 11. Test-Retest Reliability of PROMIS Domains and Comparison Measures, Heart Failure Cross-sectional Sample.

Table 11

Test-Retest Reliability of PROMIS Domains and Comparison Measures, Heart Failure Cross-sectional Sample.

Table 12. Test-Retest Reliability of PROMIS Domains and Comparison Measures, OA-K Cross-sectional Sample.

Table 12

Test-Retest Reliability of PROMIS Domains and Comparison Measures, OA-K Cross-sectional Sample.

Our ICC estimates (systematic and random error) for the HF measures ranged from 0.90 (Independence) to 0.99 (Anxiety, Satisfaction With SRA). No HF measure had an ICC estimate of <0.90.72 For the OA-K measures, our systematic and random error ICC estimates ranged from 0.93 (Independence) to 0.99 (Satisfaction With SRA, Pain Interference). No OA-K measure had an ICC estimate of <0.90 (see Table 12). For both the HF and OA-K measures, ICC estimates (systematic + random vs random error only) exhibited little to no differences (see Tables 11 and 12), which indicates that little to no systematic error affected measurement quality.

For all HF and OA-K measures having a minimum of 4 items per measure, we assessed each measure's dimensionality by conducting a CCFA. For these CCFA analyses, we used polychoric correlations, a weighted least squares-mean and variance adjusted (WLSMV) estimator, and cases without missing responses.74-76 Because our objective was to establish evidence supporting essential measure unidimensionality, in each CCFA we estimated a single factor model and then reviewed overall model fit using existing standards for excellent model fit.57

Modeling results (CFI, RMSEA, TLI, weighted root mean square residual [WRMR]) per measure analyzed are presented in Table 13. The results suggested acceptable to good fit of a unidimensional model to our response data.

Table 13. CCFA Modeling Results, HF and OA-K Cross-sectional Samples.

Table 13

CCFA Modeling Results, HF and OA-K Cross-sectional Samples.

As an example, for the HF Anxiety 5-item measure, its single-factor CCFA CFI value was 0.99 and its TLI value was 0.98, with both index values meeting the CFI and TLI ≥0.95 excellent model fit standard. The model's RMSEA value was 0.08, not meeting the <0.06 standard for excellent fit but indicative of good model fit. Finally, the model's WRMR value was 0.55, meeting the excellent fit standard of <1.00. Thus, overall single-factor model fit for the HF Anxiety measure appears to be good to excellent.

In addition to CCFA, we also conducted bifactor analyses to determine if multidimensionality was introduced into a measure with the incorporation of new, content-relevant items.77 Two measures (OA-K Physical Function, OA-K Pain Interference) had sufficient numbers of both existing items and new items to warrant such an analysis. We therefore conducted bifactor analysis on the OA-K Physical Function and OA-K Pain Interference measures, using existing items as 1 specific group factor and new items as a second. In each analysis, we observed no evidence of important multidimensionality arising via the introduction of new items into the measure (data not presented).

We employed the IRT model known as the GRM, evaluated IRT modeling assumptions, and assessed DIF. The CCFA analyses presented evidence of essential unidimensionality. We examined residual correlations from our CCFA analyses as part of testing for local item dependence and used adjusted item-total correlations as 1 test of monotonicity (ie, the assumption that overall measure total scores increase as individual item response scores increase). We observed no meaningful violations of these IRT assumptions (residual correlation data not presented).

We conducted DIF studies using 3 factors, based on the availability of sufficient subgroup sample sizes (n = 200): sex (male vs female), age (≤55 vs >55 years), and education level (completed college or not). Like the CCFA number-of-items-per-measure requirement, we conducted DIF analyses on measures having a minimum of 4 items. In stage 1 of the DIF studies, during which items were flagged for potential DIF, we identified no items involving the 3 studied DIF factors for the measures analyzed (see Table 14 for overall findings and Appendices E and F for item-level details). Therefore, we conducted no DIF stage 2 analyses (ie, DIF score impact studies).

Table 14. DIF Results, HF and OA-K Cross-sectional Sample.

Table 14

DIF Results, HF and OA-K Cross-sectional Sample.

For HF measure validity evidence, we estimated both Pearson r and Spearman rho correlations (for comparative purposes and to address potentially non-normal measure score distributions) between HF measure scores, between HF and comparison measure scores (ie, Global Health Physical and Mental, KCCQ subscores), and between comparison measure scores. HF measure results are presented in Appendices G and H. HF measure correlations with KCCQ subscore measures indicated expected convergent validity (where r/rho > 0.60) and divergent validity (where r/rho < 0.30). HF measure score correlations with Global Health Physical and Mental measure scores also indicated expected concurrent validity (where r/rho > 0.40).72 For OA-K measure validity evidence, we also estimated both Pearson r and Spearman rho correlations: between OA-K measure scores, between OA-K and comparison measure scores (ie, Global Health Physical and Mental, KOOS subscores), and between comparison measure scores. OA-K measure results are presented in Appendices I and J. OA-K measure correlations with KOOS subscore measures indicated expected convergent and divergent validity; their correlations with Global Health Physical and Mental measure scores also indicated expected concurrent validity.

We obtained evidence of known-groups validity for the HF measures by comparing HF measure scores of patients with (1) low Global Health Physical scores (ie, raw scores from 4 to 10) vs high Global Health Physical scores (raw scores from 14 to 20) and (2) low Global Health Mental scores (raw scores 4 to 10) vs high Global Health Mental scores (raw scores 14 to 20). For the Global Health Physical group comparison, members of the high-score group had statistically significantly better domain status scores for all HF domains measured except Health Behavior (see Table 15). For the Global Health Mental group comparison, members of the high-score group had statistically significantly better domain status scores for all HF domains measured (see Table 16). Using the OA-K measures, we conducted a similar set of analyses, comparing OA-K measure scores of patients with (1) low (4 to 10) vs high (14 to 20) Global Health Physical scores and (2) low (4 to 10) vs high (14 to 20) Global Health Mental scores. For both the Global Health Physical and Mental group comparisons, members of each high-score group had statistically significantly better domain status scores for all OA-K domains measured (see Tables 17 and 18).

Table 15. ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Heart Failure Cross-sectional Sample.

Table 15

ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Heart Failure Cross-sectional Sample.

Table 16. ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Heart Failure Cross-sectional Sample.

Table 16

ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Heart Failure Cross-sectional Sample.

Table 17. ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Osteoarthritis of the Knee Cross-sectional Sample.

Table 17

ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Osteoarthritis of the Knee Cross-sectional Sample.

Table 18. ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Osteoarthritis of the Knee Cross-sectional Sample.

Table 18

ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Osteoarthritis of the Knee Cross-sectional Sample.

Cross-sectional Data: Linking Studies, Objective 3, Step 5

We conducted 2 types of linking studies using the cross-sectional data. In linking studies type 1, we sought to add new items to an existing measure, linking them to the existing measure's items and placing the new items on the existing measure's established metric. We conducted 2 such studies, both involving OA-K domain measures. We linked 5 new OA-K Pain Interference items to the 8 available existing items from the PROMIS Pain Interference measure. The raw and scaled score correlations between PROMIS and OA-K Pain Interference were 0.84 and 0.84, respectively. We also linked 5 new OA-K Physical Function items to the 8 available existing items from the PROMIS Physical Function measure. The raw and scaled score correlations between PROMIS and OA-K Physical Function were 0.70 and 0.71, respectively (see Table 19).

Table 19. Measure Crosswalks Using PROsetta Stone Method Linking Studies Results, HF and OA-K Cross-sectional Samples.

Table 19

Measure Crosswalks Using PROsetta Stone Method Linking Studies Results, HF and OA-K Cross-sectional Samples.

In linking studies type 2, we sought to link scores obtained from legacy measures (ie, KCCQ for HF, KOOS and Western Ontario and McMaster Universities Osteoarthritis Index [WOMAC]78 for OA-K) to scores from PROMIS measures and, as a product of the linking, create crosswalks between legacy and PROMIS measures, thereby facilitating comparisons between scores from legacy and PROMIS measures. We conducted 5 such studies, 1 involving an HF domain measure and 4 involving OA-K domain measures, the results of which are in Appendix K.

For the HF domain measure, we linked 6 KCCQ Physical Limitation items to the 10 available existing items from the PROMIS Physical Function measure. The raw and scaled score correlations between PROMIS Physical Function and KCCQ Physical Limitation were 0.71 and 0.75, respectively.

For the OA-K domain measures, we first linked 9 KOOS Pain items to the 8 available existing items from the PROMIS Pain Interference measure. The raw and scaled score correlations between PROMIS Pain Interference and KOOS Pain were 0.70 and 0.71, respectively. We also linked 5 WOMAC Pain items to the 8 available existing items from the PROMIS Pain Interference measure, where the raw and scaled score correlations between PROMIS Pain Interference and WOMAC Pain were 0.70 and 0.71, respectively (see Table 19).

In additional linking studies we conducted, although we had originally proposed to link a larger set of scores as obtained from legacy measures to other PROMIS measure scores, we determined through the results of our analyses that the proposed linking was, for one reason or another, not recommended. The primary reason for not recommending a proposed measure score linking was because of low raw and scaled score correlations (ie, less than 0.70) between targeted PROMIS and legacy measure scores. We conducted 3 such studies, 1 involving an HF domain measure and 2 involving an OA-K domain measure.

For the HF domain measure, we attempted to link 4 KCCQ Social Limitation items to the 6 available existing items from the PROMIS Ability to Engage in SRA measure. The raw and scaled score correlations between PROMIS Ability to Engage in SRA and KCCQ Social Limitation were only 0.60 and 0.63, respectively; thus, the measure-to-measure score linking was not recommended.

For the OA-K domain measure, we attempted to link 17 KOOS Activities of Daily Living (ADL) items (which are identical to the 17 items of the WOMAC ADL) to the 8 available existing items from the PROMIS Physical Function measure. Here, the raw and scaled score correlations between PROMIS Physical Function and KOOS ADL were only 0.66 and 0.62, respectively; thus, this measure-to-measure score linking was not recommended. We also attempted to link 5 KOOS Function in Sports and Recreation items to the 8 available existing items from the PROMIS Physical Function measure. The raw and scaled score correlations between PROMIS Physical Function and KOOS Function in Sports and Recreation were only 0.59 and 0.54, respectively; thus, this proposed linking was also not recommended (see Table 19).

Longitudinal Data: Validation and Change Analyses, Objective 2, Step 4

We used our longitudinal data primarily for validation and change assessment. We did conduct a similar set of psychometric analyses as we conducted with data from the cross-sectional sample to assess our finalized measures. Cronbach α of the HF measures at baseline ranged from .62 (Symptoms) to .96 (Dyspnea); 13 of the HF measures had α values ≥.70 (see Table 20). For the OA-K measures at baseline, α values ranged from .52 (Symptoms) to .96 (Satisfaction With SRA); 11 of the OA-K measure α values were ≥.70 (see Table 21). The average interitem correlation for the HF measures ranged from 0.37 (Symptoms) to 0.79 (Pain Interference); for the OA-K measures, the average interitem correlation ranged from 0.36 (Symptoms) to 0.80 (Social Isolation; see Tables 20 and 21). Minimum and maximum adjusted item total correlations are also presented in Tables 20 and 21 for HF and OA-K measures. Cronbach α of the HF and OAK measures at follow-up are also presented.

Table 20. Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Heart Failure Longitudinal Sample at Baseline and Follow-up.

Table 20

Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Heart Failure Longitudinal Sample at Baseline and Follow-up.

Table 21. Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Osteoarthritis of the Knee Longitudinal Sample at Baseline and Follow-up.

Table 21

Internal Consistency Reliability of PROMIS Domains and Comparison Measures, Osteoarthritis of the Knee Longitudinal Sample at Baseline and Follow-up.

We calculated raw summed scores for each HF and OA-K measure and determined measure means, SDs, medians, minimums, maximums, and score distribution skewness and kurtosis; results are presented per measure in Table 22 (HF measures) and Table 23 (OA-K measures). We graphically displayed each HF and OA-K measure raw summed score distribution via histogram to help determine, along with estimated skewness and kurtosis statistics, each distribution's nature (normal vs skewed). Histograms of HF measure score distributions are presented in Appendix L; those for OA-K measure score distributions are presented in Appendix M. The score distributions of the HF and OA-K measures were approximately normal; we observed some evidence of slight skewness and kurtosis (see Tables 22 and 23, and Appendices L and M).

Table 22. Measure Statistics, Heart Failure Longitudinal Sample.

Table 22

Measure Statistics, Heart Failure Longitudinal Sample.

Table 23. Measure Statistics, Osteoarthritis of the Knee Longitudinal Sample.

Table 23

Measure Statistics, Osteoarthritis of the Knee Longitudinal Sample.

For HF and OA-K measures with a minimum of 4 items per measure, we assessed measure dimensionality via CCFA, using polychoric correlations, a WLSMV estimator, and cases without missing responses. In each CCFA, we estimated a single factor model and reviewed overall model fit using existing standards for excellent model fit. Results indicated acceptable to good fit of a unidimensional model to our response data (data not presented).

Because we employed the GRM, an IRT model, we evaluated IRT modeling assumptions. Our CCFA analyses presented evidence of essential unidimensionality. We examined residual correlations from our CCFA analyses (local dependence) and reviewed adjusted item total correlations (monotonicity). We observed no significant violations of these IRT assumptions (residual correlation data not presented).

For HF measure validity evidence, we estimated Pearson r and Spearman rho correlations between HF measure scores, between HF and comparison measure scores (ie, Global Health Physical and Mental, KCCQ subscores), and between comparison measure scores. HF measure results are presented in Appendices N and O. HF measure correlations with KCCQ subscore measures indicated expected convergent validity (where r/rho > 0.60) and divergent validity (where r/rho < 0.30). HF measure score correlations with Global Health Physical and Mental measure scores also indicated expected concurrent validity (where r/rho > 0.40). For OA-K measure validity evidence, we also estimated Pearson r and Spearman rho correlations: between OA-K measure scores, between OA-K and comparison measure scores (ie, Global Health Physical and Mental, KOOS subscores), and between comparison measure scores. OA-K measure results are presented in Appendices P and Q. OA-K measure correlations with KOOS subscore measures indicated expected convergent and divergent validity; their correlations with Global Health Physical and Mental measure scores also indicated expected concurrent validity.

We obtained evidence of known-groups validity for the HF measures by comparing HF measure scores of patients with (1) low (4 to 10) vs high (14 to 20) Global Health Physical scores and (2) low (4 to 10) vs high (14 to 20) Global Health Mental scores. For the Global Health Physical and Mental group comparisons, members of the high-score group had statistically significantly better domain status scores for all HF domains measured except Health Behavior (see Tables 24 and 25). Using the OA-K measures, we conducted a similar set of analyses, comparing OA-K measure scores of patients with (1) low (4 to 10) vs high (14 to 20) Global Health Physical scores and (2) low (4 to 10) vs high (14 to 20) Global Health Mental scores. For both the Global Health Physical and Mental group comparisons, members of each high-score group had statistically significantly better domain status scores for all OA-K domains measured (see Tables 26 and 27).

Table 24. ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Heart Failure Longitudinal Sample at Baseline.

Table 24

ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Heart Failure Longitudinal Sample at Baseline.

Table 25. ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Heart Failure Longitudinal Sample at Baseline.

Table 25

ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Heart Failure Longitudinal Sample at Baseline.

Table 26. ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Osteoarthritis of the Knee Longitudinal Sample at Baseline.

Table 26

ANOVA of PROMIS Domains by PROMIS GH Physical (Low vs High), Osteoarthritis of the Knee Longitudinal Sample at Baseline.

Table 27. ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Osteoarthritis of the Knee Longitudinal Sample at Baseline.

Table 27

ANOVA of PROMIS Domains by PROMIS GH Mental (Low vs High), Osteoarthritis of the Knee Longitudinal Sample at Baseline.

We conducted paired t tests of baseline vs follow-up HF and OA-K measure scores to obtain evidence of within-person change across time. For 9 of the HF domain measures (Physical Function, Fatigue, Sleep Disturbance, Anxiety, Life Satisfaction, Satisfaction With SRA, Health Behavior, Cognitive Ability, Anger), HF patients had statistically significantly better domain status scores at follow-up compared with their baseline status scores (see Table 28).72 For the OA-K domain measures, patients had statistically significantly better domain status scores at follow-up compared with their baseline status scores, for all OA-K domains measured (see Table 29).

Table 28. Paired t Test of PROMIS Domains (Baseline vs Follow-up), Heart Failure Longitudinal Sample.

Table 28

Paired t Test of PROMIS Domains (Baseline vs Follow-up), Heart Failure Longitudinal Sample.

Table 29. Paired t Test of PROMIS Domains (Baseline vs Follow-up), Osteoarthritis of the Knee Longitudinal Sample.

Table 29

Paired t Test of PROMIS Domains (Baseline vs Follow-up), Osteoarthritis of the Knee Longitudinal Sample.

Finally, we created change scores (follow-up score minus baseline score) for each measured domain and then compared the change scores of patients who indicated they had experienced meaningful positive change in the domain of interest (Global Rating of Change Score 2-4) vs patients who indicated they had experienced negative or no change (Global Rating of Change Score −4-1). For the HF domain measures, patients indicating they had experienced meaningful positive change had statistically significantly better domain status change scores for 8 HF domains measured (Physical Function, Fatigue, Sleep Disturbance, Anxiety, Satisfaction With SRA, Independence, Pain Interference, Symptoms). For the OA-K domain measures, patients indicating they had experienced meaningful positive change had statistically significantly better domain status change scores for all OA-K domains measured (data not presented).

Finalize PROMIS-PLUS Instruments and Disseminate, Objective 3, Step 6

After validation, in which we determined that all the items and domains demonstrated reasonable measurement properties, we conducted brief cognitive interviews with patients from the longitudinal samples (HF = 10 and OA-K = 22). The purpose of the interviews was to get patient input into the final instruments to ensure relevance of the questions for them and to get input on potential uses from the patient's perspective. Patients expressed no major concerns, so we retained all items: 86 items in 18 domains for the HF instrument and 76 items in 14 domains for the OA-K instrument (see Appendices A and B for the full list of included items).

We have started to disseminate the findings of this study and the resulting instruments through 2 publications45,72 and 7 presentations. We are in the process of writing 2 additional papers and will include the new domains and items in PROMIS item banks after PROMIS reviews them. Once the new measures are included in PROMIS, the instruments will be available for others to use.

Discussion

Decisional Context

In this report, we describe, and illustrate with data, a method for creating a combined generic and disease-specific PROMs, using the PROMIS as a calibrated generic base on which to add disease-specific content. The methodology has potential for creating PROMs that meet multiple purposes in clinical care and clinical trials, and in organization, state, and national efforts to monitor the quality of health care delivery. A well-calibrated PROM, supplemented with clinically resonant content, enables interested individuals (eg, administrators) and organizations (eg, VA) to enhance clinical relevance while at the same time report results on a common metric. Common metrics such as those in PROMIS can, in turn, support comparative effectiveness research and comparison of results across populations to address disparities in care. Introducing these measures into clinical practice may ultimately improve the health outcomes that matter most to patients and their significant others or caregivers.

The PROMs that resulted from this effort consisted mostly of items (74% HF and 68% OA-K) and domains (68% HF and 79% OA-K) that already exist in PROMIS. Yet, a substantial amount of content important to patients (ie, gap items) were created, as were new domains, such as Independence, Life Satisfaction, and Symptoms. This resulted in relatively long instruments with a large number of items and domains that maintained the measurement properties of original PROMIS scales. Fortunately, the modular nature of the PROMIS system allows one to select the clinically most relevant content for any given setting or context; thus, the entire questionnaire need not be administered in clinical practice. In addition, based on estimates for typical PROMIS instrument completion and the actual time it took panel respondents to complete the survey, we anticipate that the instruments will take 12 to 16 minutes to complete. Depending on the time of administration or purpose, this may be a reasonable length.

Individuals weighing the use of instruments, such as PROMIS-PLUS vs legacy instruments, will need to consider the instrument modifications (eg, selection of a smaller subset of items) and administration protocols (eg, computer-adaptive test) required to make data collection feasible while retaining the potential benefits. We did not outperform PROMIS with PROMIS-PLUS and did not set out to do this. We have, however, demonstrated that one can maintain general measurement properties while paying better attention to those concerns patients find important.

Implementation of Study Results

The key end users for this initial work are researchers, clinicians, and health care systems interested in PRO development or use. Because PROMIS was the basis for the development and distribution, we anticipate widespread interest and access. The instruments should be of interest to departments of cardiology or orthopedics or health care systems interested in including a PRO in their day-to-day clinical care and/or as part of their quality measures to assess care across their organization. Researchers working in PRO development can review the methodology and conduct further tests of its replicability with other conditions.

One consideration for implementation that we learned from interviews with patients and partner site stakeholders is that different domains and items from the resulting instruments may be of more or less importance depending on the purpose for which they are being used. In addition, patients and stakeholders advised that the length of each instrument must be acknowledged and taken into consideration when thinking about how to best use and implement them.

One potential strategy to improve the usability of the measures, particularly for ongoing clinical care, is further exploration of computer-adaptive testing models. Another possible strategy, given that the items are in the common measurement system of PROMIS, is to have health systems and/or patients choose subsets of items of interest to track. For instance, health systems could choose a minimum set of items to keep constant but allow patients to choose additional items to track based on their preferences. This would keep measures that are important for clinicians and health systems and acknowledge what is important to the patient. Similar strategies could be used in research, depending on the focus and research questions.

Future development should test strategies such as this to reduce the number of items and maintain measurement strengths. An additional application of our study involves the crosswalks we produced between legacy and PROMIS-PLUS measures, which can be used to compare PRO results across different efforts and institutions. We recognize that the crosswalks are still limited based on our results, but we believe it is a step in the right direction to create a mechanism for a universal measurement system.

Generalizability

Our methodology for creating condition-specific PROMs is based on existing PRO development standards of the International Society for Pharmacoeconomics and Outcomes Research Task Force19,42-44 and PROMIS. The methodology should be generalizable to developing condition-specific measures for other conditions and using other generic PROMs. The primary consideration is the extent to which the reference generic PROM contains relevant, existing items, which would foster more efficient creation of the new combined PROM.

Our focus group and cognitive interview participants were not representative of the diversity of education and racial and ethnic backgrounds in the United States, which may limit applicability for some groups. Additional testing of the new items will need to include a more representative group to determine relevance. However, calibration testing for existing PROMIS items that we selected were based on a general population sample representative of distributions of race, ethnicity, and education using 2000 US Census data,20 which provides some assurance that these instruments are sensitive to subpopulations. This calibration sample included individuals with coexisting conditions,20 which is particularly relevant for our HF instrument.

Study Limitations

Throughout this report, we have noted several limitations. These include the following: (1) our focus group participants were not representative of the diversity of education, racial, or ethnic backgrounds in the United States; (2) we were not able to account for nonrespondents in our cross-sectional panel; (3) patients and clinicians told us that the final instruments are probably too long for ongoing data collection in clinical settings; and (4) we were unable to retain most respondents with HF in the longitudinal study owing to their disease burden and inconsistent access to patients in clinic at follow-up because of varied treatment pathways across patients and institutions.72

A limitation of this methodology for creating condition-specific PROMs that are valid and responsive is the amount of time and resources needed. Our study took 4 years and more than a million dollars to complete. We urge researchers, clinicians, and funders of PRO development to develop or use existing strategies that are efficient, such as using current PROMs to reduce the need to create and test new items. Another idea is to examine the feasibility of using qualitative data from existing sources for certain conditions (eg, focus group data collected in a previous study).45 We encourage others to test alternative approaches to both create comprehensive measures and reduce the time and resources needed.

Future Research

Although we see value in creating condition-specific PROMs, some PRO researchers debate the need to create measures specific to every clinical population; they take the position that some outcome measures are “common across many diseases.”79 Our study supports this assumption in that many domains and items were already present in PROMIS and 3 of the new domains identified for OA-K were also new domains for HF. Other national and international efforts, such as the Core Outcome Measures in Effectiveness Trials Initiative80 and International Consortium for Health Outcomes Measurement Standard Sets,81 are working to define similar outcomes within specific conditions and recommend using existing measures.

However, 2 open questions remain worth testing: (1) Does the addition of condition-specific items and domains to existing general measures increase the relevance and usability of PROMs for individuals (eg, patients, clinicians, researchers, administrators), organizations, insurers, and the government? (2) Does it improve patient care, quality measures, and comparisons of outcomes across populations and systems of care? Given the imperative to incorporate patients' health status and values as part of ongoing quality care and measurement, we urge more research and work in this area.

Conclusions

In a 2016 article in the New England Journal of Medicine, Porter et al argued that “universal measurement and reporting of outcomes,” including PROs, “is an agenda whose time has come.”2 The pathway to achieve universal measurement within and across patient populations, however, is complicated and challenging. As Porter et al pointed out, its achievement is hindered by specialty and other organizations that want to create their own measures. It is also complicated by the sheer number of conditions for which PROs may be relevant and PROMs that already exist.

Our project created and tested 1 potential pathway to assist in creating a universal measurement system by bridging the difference between generic- and disease-specific PROs. It also built a crosswalk to existing disease-specific measures to allow interested parties to see how existing measures could be combined to create a universal measurement system. In doing so, our work contributes to efforts such as the International Consortium for Health Outcomes Measurement's Standardized Outcome Sets2 to move toward measurement that defines and tracks health care value in a holistic and transparent manner.

References

1.
Lavallee DC, Chenok KE, Love RM, et al. Incorporating patient-reported outcomes into health care to engage patients and enhance care. Health Aff (Millwood). 2016;35(4):575-582. doi:10.1377/hlthaff.2015.1362 [PubMed: 27044954] [CrossRef]
2.
Porter ME, Larsson S, Lee TH. Standardizing patient outcomes measurement. N Engl J Med. 2016;374(6):504-506. doi:10.1056/NEJMp1511701 [PubMed: 26863351] [CrossRef]
3.
Fitzpatrick R, Davey C, Buxton M, Jones D. Evaluating patient-based outcome measures for use in clinical trials. Heal Technol Assess. 1998;2(14):1-74. [PubMed: 9812244]
4.
Patient Reported Outcome Measures (PROMs). National Health Service of England; 2016. Accessed October 4, 2016. https://www​.england.nhs​.uk/statistics/statistical-work-areas/proms/
5.
Dawson J, Doll H, Fitzpatrick R, Jenkinson C, Carr AJ. The routine use of patient reported outcome measures in healthcare settings. BMJ. 2010;340:c186. [PubMed: 20083546]
6.
Black N. Patient reported outcome measures could help transform healthcare. BMJ. 2013;346:f167. doi:10.1136/bmj.f167 [PubMed: 23358487] [CrossRef]
7.
Smith PC, Street AD. On the uses of routine patient-reported health outcome data. Health Econ. 2013;22(2):119-131. doi:10.1002/hec.2793 [PubMed: 22238023] [CrossRef]
8.
Centers for Medicare & Medicaid Services. Federal Register. Part II. 45 CFR Part 170. 42 CFR Parts 412, 413, and 495. Medicare and Medicaid Programs; Electronic Health Record Incentive Program—Stage 2; 2012. [PubMed: 22946138]
9.
Cella D, Hahn EA, Jensen S, et al. Patient-Reported Outcomes in Performance Measurement. RTI Press; 2015. doi:10.3768/rtipress.2015.bk.0014.1509 [PubMed: 28211667] [CrossRef]
10.
Owolabi MO. Which is more valid for stroke patients: generic or stroke-specific quality of life measures? Neuroepidemiology. 2010;34(1):8-12. doi:10.1159/000255460 [PubMed: 19893323] [CrossRef]
11.
Ware JE, Gandek B, Allison J. The validity of disease-specific quality of life attributions among adults with multiple chronic conditions. Int J Stat Med Res. 2016;5(1):17-40. doi:10.1016/j.rasd.2014.08.015.Social [PMC free article: PMC4831653] [PubMed: 27087882] [CrossRef]
12.
Flynn KE, Dew MA, Lin L, et al. Reliability and construct validity of PROMIS® measures for patients with heart failure who undergo heart transplant. Qual Life Res. 2015;24(11):2591-2599. doi:10.1007/s11136-015-1010-y [PMC free article: PMC4593724] [PubMed: 26038213] [CrossRef]
13.
Rolfson O, Wissig S, van Maasakkers L, et al. Defining an international standard set of outcome measures for patients with hip or knee osteoarthritis: consensus of the International Consortium for Health Outcomes Measurement Hip and Knee Osteoarthritis Working Group. Arthritis Care Res (Hoboken). 2016;68(11):1631-1639. doi:10.1002/acr.22868 [PMC free article: PMC5129496] [PubMed: 26881821] [CrossRef]
14.
Cella D, Nowinski CJ. Measuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system. Arch Phys Med Rehabil. 2002;83(12 Suppl 2):S10-S17. doi:10.1053/apmr.2002.36959 [PubMed: 12474167] [CrossRef]
15.
Bergland A, Thorsen H, Kåresen R. Association between generic and disease-specific quality of life questionnaires and mobility and balance among women with osteoporosis and vertebral fractures. Aging Clin Exp Res. 2011;23(4):296-303. http://www​.ncbi.nlm.nih​.gov/pubmed/22067372. [PubMed: 22067372]
16.
Fitzsimmons D, Johnson CD, George S, et al. Development of a disease specific quality of life (QoL) questionnaire module to supplement the EORTC core cancer QoL questionnaire, the QLQ-C30 in patients with pancreatic cancer. Eur J Cancer. 1999;35(6):939-941. doi:10.1016/S0959-8049(99)00047-7 [PubMed: 10533475] [CrossRef]
17.
Freeman J, Hobart J, Thompson A. Does adding MS-specific items to a generic measure (the SF-36) improve measurement? Neurology. 2001;57(1):68-74. doi:10.1212/WNL.57.1.68 [PubMed: 11445630] [CrossRef]
18.
Webster K, Cella D, Yost K. The Functional Assessment of Chronic Illness Therapy (FACIT) measurement system: properties, applications, and interpretation. Health Qual Life Outcomes. 2003;1:79. doi:10.1186/1477-7525-1-79 [PMC free article: PMC317391] [PubMed: 14678568] [CrossRef]
19.
DeWalt DA, Rothrock N, Yount S, Stone AA, PROMIS Cooperative Group. Evaluation of item candidates: the PROMIS qualitative item review. Med Care. 2007;45(5 Suppl 1):S12-S20. doi:10.1097/01.mlr.0000254567.79743.e2 [PMC free article: PMC2810630] [PubMed: 17443114] [CrossRef]
20.
Cella D, Riley W, Stone A, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008. J Clin Epidemiol. 2010;63(11):1179-1194. doi:10.1016/j.jclinepi.2010.04.011 [PMC free article: PMC2965562] [PubMed: 20685078] [CrossRef]
21.
Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH roadmap cooperative group during its first two years. Med Care. 2007;45(5 Suppl 1):S3-S11. doi:10.1097/01.mlr.0000258615.42478.55 [PMC free article: PMC2829758] [PubMed: 17443116] [CrossRef]
22.
Beckmann JT, Hung M, Bounsanga J, Wylie JD, Granger EK, Tashjian RZ. Psychometric evaluation of the PROMIS Physical Function Computerized Adaptive Test in comparison to the American Shoulder and Elbow Surgeons score and Simple Shoulder Test in patients with rotator cuff disease. J Shoulder Elb Surg. 2015;24(12):1961-1967. doi:10.1016/j.jse.2015.06.025 [PubMed: 26321484] [CrossRef]
23.
Fries JF, Cella D, Rose M, Krishnan E, Bruce B. Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing. J Rheumatol. 2009;36(9):2061-2066. doi:10.3899/jrheum.090358 [PubMed: 19738214] [CrossRef]
24.
Varni JW, Stucky BD, Thissen D, et al. PROMIS Pediatric Pain Interference scale: an item response theory analysis of the pediatric pain item bank. J Pain. 2010;11(11):1109-1119. doi:10.1016/j.jpain.2010.02.005 [PMC free article: PMC3129595] [PubMed: 20627819] [CrossRef]
25.
O'Connell JB, Bristow MR. Economic impact of heart failure in the United States: time for a different approach. J Heart Lung Transplant. 1994;13(4):S107-S112. http://www​.ncbi.nlm.nih​.gov/pubmed/7947865 [PubMed: 7947865]
26.
National Center for Health Statistics. Summary Health Statistics for U.S. Adults: National Health Interview Survey, 2011. Centers for Disease Control and Prevention, US Department of Health and Human Services. December 2012. Accessed March 6, 2013. https://www​.cdc.gov/nchs​/data/series/sr_10/sr10_256.pdf
27.
National Heart Lung and Blood Institute, National Institutes of Health. What is heart failure? US Department of Health & Human Services. Accessed March 6, 2013. http://www​.nhlbi.nih​.gov/health/health-topics/topics/hf/
28.
Cram P, Lu X, Kates SL, Singh JA, Li Y, Wolf BR. Total knee arthroplasty volume, utilization, and outcomes among Medicare beneficiaries, 1991-2010. J Am Med Assoc. 2012;308(12):1227. doi:10.1001/2012.jama.11153 [PMC free article: PMC4169369] [PubMed: 23011713] [CrossRef]
29.
Katz JN, Phillips CB, Baron JA, et al. Association of hospital and surgeon volume of total hip replacement with functional status and satisfaction three years following surgery. Arthritis Rheum. 2003;48(2):560-568. doi:10.1002/art.10754 [PubMed: 12571867] [CrossRef]
30.
Kurtz S, Ong K, Lau E, Mowat F, Halpern M. Projections of primary and revision hip and knee arthroplasty in the United States from 2005 to 2030. J Bone Jt Surg Am Vol. 2007;89(4):780-785. doi:10.2106/JBJS.F.00222 [PubMed: 17403800] [CrossRef]
31.
Hunt SA, Baker DW, Chin MH, et al. ACC/AHA guidelines for the evaluation and management of chronic heart failure in the adult: executive summary. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (committee to revise the 1995 Guidelines for the Evaluation and Management of Heart Failure). J Am Coll Cardiol. 2001;38(7):2101-2113. http://www​.ncbi.nlm.nih​.gov/pubmed/11738322 [PubMed: 11738322]
32.
Remme WJ, Swedberg K. Guidelines for the diagnosis and treatment of chronic heart failure. Eur Heart J. 2001;22(17):1527-1560. doi:10.1053/euhj.2001.2783 [PubMed: 11492984] [CrossRef]
33.
Lipkin D, Canepa-Anson R, Stephens M, Poole-Wilson P. Factors determining symptoms in heart failure: comparison of fast and slow exercise tests. Br Heart J. 1986;55(5):439-445. [PMC free article: PMC1216378] [PubMed: 3707783]
34.
Dickstein K, Cohen-Solal A, Filippatos G, et al. ESC guidelines for the diagnosis and treatment of acute and chronic heart failure 2008: The Task Force for the Diagnosis and Treatment of Acute and Chronic Heart Failure 2008 of the European Society of Cardiology. Eur Heart J. 2008;29(19):2388-2442. doi:10.1093/eurheartj/ehn309 [PubMed: 18799522] [CrossRef]
35.
Reuben D, Tinetti M. Goal-oriented patient care-an alternative health outcomes paradigm. N Engl J Med. 2012;366(9):777-779. [PubMed: 22375966]
36.
Arterburn D, Wellman R, Westbrook E, et al. Introducing decision aids at Group Health was linked to sharply lower hip and knee surgery rates and costs. Health Aff (Millwood). 2012;31(9):2094-2104. doi:10.1377/hlthaff.2011.0686 [PubMed: 22949460] [CrossRef]
37.
Oshima Lee E, Emanuel EJ. Shared decision making to improve care and reduce costs. N Engl J Med. 2013;368(1):6-8. doi:10.1056/NEJMp1209500 [PubMed: 23281971] [CrossRef]
38.
Spatz ES, Spertus JA. Shared decision making: a path toward improved patient-centered outcomes. Circ Cardiovasc Qual Outcomes. 2012;5(6):e75-e77. doi:10.1161/CIRCOUTCOMES.112.969717 [PubMed: 23170005] [CrossRef]
39.
Weinstein JN. The missing piece: embracing shared decision making to reform health care. Spine (Phila Pa 1976). 2000;25(1):1-4. http://www​.ncbi.nlm.nih​.gov/pubmed/10647152 [PubMed: 10647152]
40.
Tomek IM, Sabel AL, Froimson MI, et al. A collaborative of leading health systems finds wide variations in total knee replacement delivery and takes steps to improve value. Health Aff (Millwood). 2012;31(6):1329-1338. doi:10.1377/hlthaff.2011.0935 [PubMed: 22571844] [CrossRef]
41.
Schifferdecker KE, Reed VA. Using mixed methods research in medical education: basic guidelines for researchers. Med Educ. 2009;43(7):637-644. doi:10.1111/j.1365-2923.2009.03386.x [PubMed: 19573186] [CrossRef]
42.
Forrest CB, Bevans KB, Pratiwadi R, et al. Development of the PROMIS pediatric global health (PGH-7) measure. Qual Life Res. 2014;23(4):1221-1231. doi:10.1007/s11136-013-0581-8 [PMC free article: PMC3966936] [PubMed: 24264804] [CrossRef]
43.
Patrick DL, Burke LB, Gwaltney CJ, et al. Content validity - establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: part 2 - assessing respondent understanding. Value Health. 2011;14(8):978-988. doi:10.1016/j.jval.2011.06.013 [PubMed: 22152166] [CrossRef]
44.
Patrick DL, Burke LB, Gwaltney CJ, et al. Content validity - establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: part 1 -eliciting concepts for a new PRO instrument. Value Health. 2011;14(8):967-977. doi:10.1016/j.jval.2011.06.014 [PubMed: 22152165] [CrossRef]
45.
Schifferdecker KE, Yount SE, Kaiser K, et al. A method to create a standardized generic and condition-specific patient-reported outcome measure for patient care and healthcare improvement. Qual Life Res. 2018;27(2):367. doi:10.1007/s11136-017-1675-5 [PubMed: 28795261] [CrossRef]
46.
Choi SW, Podrabsky T, McKinney N, Schalet BD, Cook KF, Cella D. PROsetta Stone Analysis Report: A Rosetta Stone for Patient Reported Outcomes. Vol 1. Published September 28, 2015. http://www​.prosettastone​.org/AnalysisReport​/Documents/PROsetta​%20Stone%20Analysis​%20Report_Vol%201_updated-09-28-2015​.pdf
47.
International Statistical Classification of Diseases and Related Health Problems. World Health Organization; 2016. https://apps​.who.int​/iris/handle/10665/246208
48.
The Criteria Committee of the New York Heart Association. Nomenclature and Criteria for Diagnosis of Diseases of the Heart and Great Vessels. 9th ed. Little, Brown & Company; 1994.
49.
Green CP, Porter CB, Bresnahan DR, Spertus JA. Development and evaluation of the Kansas City Cardiomyopathy Questionnaire: a new health status measure for heart failure. J Am Coll Cardiol. 2000;35(5):1245-1255. http://www​.ncbi.nlm.nih​.gov/pubmed/10758967. [PubMed: 10758967]
50.
Roos EM, Roos HP, Lohmander LS, Ekdahl C, Beynnon BD. Knee Injury and Osteoarthritis Outcome Score (KOOS)--development of a self-administered outcome measure. J Orthop Sports Phys Ther. 1998;28(2):88-96. doi:10.2519/jospt.1998.28.2.88 [PubMed: 9699158] [CrossRef]
51.
Dedoose. 2016. www​.dedoose.com.
52.
Boyatzis RE. Transforming Qualitative Information: Thematic Analysis and Code Development. SAGE Publications; 1998. https://us​.sagepub.com​/en-us/nam/transforming-qualitative-information/book7714
53.
Gershon RC, Rothrock N, Hanrahan R, Bass M, Cella D. The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research. J Appl Meas. 2010;11(3):304-314. doi:10.1037/a0013262.Open [PMC free article: PMC3686485] [PubMed: 20847477] [CrossRef]
54.
Cella D, Riley W, Stone A, et al. Initial adult health item banks and first wave testing of the Patient-Reported Outcomes Measurement Information System (PROMIS) Network: 2005-2008. J Clin Epidemiol. 2011;63(11):1179-1194. doi:10.1016/j.jclinepi.2010.04.011.Initial [PMC free article: PMC2965562] [PubMed: 20685078] [CrossRef]
55.
Willis G, Reeve BB, Barofsky I. The use of cognitive interviewing techniques in quality of life and patient-reported outcomes assessment. Outcomes Assess Cancer Meas Methods Appl. 2005:610-622.
56.
57.
PROMIS Cooperative Group. PROMIS® Instrument Development and Validation Scientific Standards Version 2.0. Revised May 2013. https://www​.mcgill.ca​/can-pro-network/files​/can-pro-network/promisstandards​_vers2.0_final.pdf
58.
Reeve BB, Hays RD, Bjorner JB, et al. Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007;45(Suppl 1):S22-S31. doi:10.1097/01.mlr.0000250483.85507.04 [PubMed: 17443115] [CrossRef]
59.
Cella M, Knibbe C, Danhof M, Della Pasqua O. What is the right dose for children? Br J Clin Pharmacol. 2010;70(4):597-603. doi:10.1111/j.1365-2125.2009.03591.x [PMC free article: PMC2950994] [PubMed: 21087295] [CrossRef]
60.
Choi SW, Gibbons LE, Crane PK. lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. J Stat Softw. 2011;39(8):1-30. http://www​.ncbi.nlm.nih​.gov/pubmed/21572908 [PMC free article: PMC3093114] [PubMed: 21572908]
61.
Ponocny I. Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika. 2001;66(3):437-459.
62.
Choi SW, Gibbons LE, Crane PK. lordif: logistic ordinal regression differential item functioning using IRT. Published March 3, 2016. https://cran​.r-project​.org/web/packages/lordif/lordif.pdf
63.
Cohen J. A power primer. Psychol Bull. 1992;112(1):155-159. http://www​.ncbi.nlm.nih​.gov/pubmed/19565683 [PubMed: 19565683]
64.
Chen W-H, Revicki DA, Lai J-S, Cook KF, Amtmann D. Linking pain items from two studies onto a common scale using item response theory. J Pain Symptom Manage. 2009;38(4):615-628. doi:10.1016/j.jpainsymman.2008.11.016 [PMC free article: PMC2761512] [PubMed: 19577422] [CrossRef]
65.
Dorans NJ. Linking scores from multiple health outcome instruments. Qual Life Res. 2007;16 Suppl 1:85-94. doi:10.1007/s11136-006-9155-3 [PubMed: 17286198] [CrossRef]
66.
Choi SW, Schalet B, Cook KF, Cella D. Establishing a common metric for depressive symptoms: linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychol Assess. 2014;26(2):513-527. doi:10.1037/a0035768 [PMC free article: PMC5515387] [PubMed: 24548149] [CrossRef]
67.
Kim S, Lee W-C. An extension of four IRT linking methods for mixed-format tests. J Educ Meas. 2006;43(1):53-76. doi:10.1111/j.1745-3984.2006.00004.x [CrossRef]
68.
Braun HI, Holland PW. Observed-score test equating: a mathematical analysis of some ETS equating procedures. In: Holland PW, Rubin DB, eds. Test Equating. Academic Press; 1982:9-49.
69.
Kim S-H, Cohen AS. A comparison of linking and concurrent calibration under the graded response model. Appl Psychol Meas. 2002;26(1):25-41.
70.
Holland PW, Thayer DT. Univariate and bivariate loglinear models for discrete test score distributions. ETS Res Rep Ser. 1998;1998(2):i-56. doi:10.1002/j.2333-8504.1998.tb01776.x [CrossRef]
71.
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd Ed. Lawrence Erlbaum Associates; 1988.
72.
Ahmad F, Kallen M, Schifferdecker K, et al. Development and Initial Validation of the PROMIS®-Plus-HF profile measure. Circ Hear Fail. 2019;12(6):e005751. doi:10.1161/CIRCHEARTFAILURE.118.005751 [PMC free article: PMC6711378] [PubMed: 31163985] [CrossRef]
73.
Hair J, Hult G, Ringle C, Sarstedt M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). 2nd Ed. Sage Publications; 2017.
74.
Li C-H. Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behav Res Methods. 2016;48(3):936-949. doi:10.3758/s13428-015-0619-7 [PubMed: 26174714] [CrossRef]
75.
Cook KF, Kallen MA, Amtmann D. Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT's unidimensionality assumption. Qual Life Res. 2009;18(4):447-460. doi:10.1007/s11136-009-9464-4 [PMC free article: PMC2746381] [PubMed: 19294529] [CrossRef]
76.
Muthén LK, Muthén BO. Mplus 8 User's Guide. Muthén & Muthén; 2017. https://www​.statmodel​.com/download/usersguide​/MplusUserGuideVer_8.pdf
77.
Reise SP, Morizot J, Hays RD. The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Qual Life Res. 2007;16(Suppl 1):19-31. doi:10.1007/s11136-007-9183-7 [PubMed: 17479357] [CrossRef]
78.
Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol. 1988;15(12):1833-1840. http://www​.ncbi.nlm.nih​.gov/pubmed/3068365. [PubMed: 3068365]
79.
Cook KF, Bamer AM, Roddey TS, Kraft GH, Kim J, Amtmann D. A PROMIS fatigue short form for use by individuals who have multiple sclerosis. Qual Life Res. 2012;21(6):1021-1030. doi:10.1007/s11136-011-0011-8 [PMC free article: PMC5609831] [PubMed: 21927914] [CrossRef]
80.
Prinsen CAC, Vohra S, Rose MR, et al. Core Outcome Measures in Effectiveness Trials (COMET) initiative: protocol for an international Delphi study to achieve consensus on how to select outcome measurement instruments for outcomes included in a ‘core outcome set.’ Trials. 2014;15(1):247. doi:10.1186/1745-6215-15-247 [PMC free article: PMC4082295] [PubMed: 24962012] [CrossRef]
81.
Kelley TA. International Consortium for Health Outcomes Measurement (ICHOM). Trials. 2015;16(Suppl 3):O4. doi:10.1186/1745-6215-16-S3-O4 [CrossRef]

Related Publications

•.
Schifferdecker KE, Yount SE, Kaiser K, et al. A method to create a standardized generic and condition-specific patient-reported outcome measure for patient care and healthcare improvement. Qual Life Res. 2018;27(2):367-378. doi:10.1007/s11136-017-1675-5 [PubMed: 28795261] [CrossRef]
•.
Ahmad FS, Kallen MA, Schifferdecker KE, et al. The development and initial validation of the PROMIS®-Plus-HF profile measure. Circ Heart Fail. 2019;12(6):e005751. doi:10.1161/CIRCHEARTFAILURE.118.005751 [PMC free article: PMC6711378] [PubMed: 31163985] [CrossRef]

Acknowledgments

We are grateful to the members of the PFAC for the important perspectives they provided: Roger Arend, Carol DuBois, Jeff Gardner, Annette Jo Giarrante, David Swanz, Janet Trzaska, and Linda Wilkinson. Faraz Ahmad, MD; Jill Gelow, MD, MPH; and Wayne Moschetti, MD, MS, also provided critical guidance on this work. We also thank other members of our research team who participated in parts of the project: Anna Adachi-Mejia, PhD; Amy Eisenstein, PhD; George J. Greene, PhD; David T. Eton, PhD; and Eugene Nelson, DSc, MPH. Finally, this work would not have been possible without the participation of the PIs (named here) and research coordinators at all of our clinical sites: Alan Kono, MD, Ivan Tomek, MD, and Karl Koenig, MD, at Dartmouth-Hitchcock Medical Center; Peter McCullough, MD, MPH, at Baylor Heart and Vascular Hospital; Ritesh Shah, MD, at Illinois Bone and Joint Institute; Douglas Sawyer, MD, PhD, at Maine Medical Center; David Eton, PhD, and Shannon Dunlay, MD, MS, at Mayo Clinic; David Manning, MD, and Clyde Yancy, MD, at Northwestern Medicine; Jill Gelow, MD, MPH, Lynn Marshall, ScD, and Kathryn Schabel, MD, at Oregon Health and Science University; Marie Bakitas, DNSc, APRN, at the University of Alabama at Birmingham; and Stephen Kimmel, MD, MSCE, at the University of Pennsylvania.

Research reported in this report was [partially] funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (#ME-1303-5928) Further information available at: https://www.pcori.org/research-results/2013/adding-disease-specific-concerns-patient-reported-outcome-measures

Appendices

Appendix C.

Histograms of Heart Failure Measure Score Distribution for the Cross-sectional Sample (PDF, 405K)

Figure 1. Distribution of PROMIS Dyspnea domain scores among people with heart failure in the cross-sectional sample (PDF, 124K)

Figure 2. Distribution of PROMIS Physical Function domain scores among people with heart failure in the cross-sectional sample (PDF, 151K)

Figure 3. Distribution of PROMIS Fatigue domain scores among people with heart failure in the cross-sectional sample (PDF, 144K)

Figure 4. Distribution of PROMIS Sleep Disturbance domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 5. Distribution of PROMIS Depression domain scores among people with heart failure in the cross-sectional sample (PDF, 144K)

Figure 6. Distribution of PROMIS Anxiety domain scores among people with heart failure in the cross-sectional sample (PDF, 118K)

Figure 7. Distribution of PROMIS Life Satisfaction domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 8. Distribution of PROMIS Satisfaction with Social Roles and Activities domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 9. Distribution of PROMIS Ability to Participate in Social Roles and Activities domain scores among people with heart failure in the cross-sectional sample (PDF, 144K)

Figure 10. Distribution of PROMIS Social Isolation domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 11. Distribution of PROMIS Illness Burden domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 12. Distribution of PROMIS Independence domain scores among people with heart failure in the cross-sectional sample (PDF, 142K)

Figure 13. Distribution of PROMIS Pain Interference domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 14. Distribution of PROMIS Symptoms domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 15. Distribution of PROMIS Health Behavior Outcomes domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 16. Distribution of PROMIS Cognitive Function domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 17. Distribution of PROMIS Cognitive Abilities domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 18. Distribution of PROMIS Anger domain scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 19. Distribution of PROMIS Global Health: Physical scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 20. Distribution of PROMIS Global Health: Mental scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 21. Distribution of KCCQ: Physical Limitation scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 22. Distribution of KCCQ: Symptom Stability scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 23. Distribution of KCCQ: Symptom Frequency scores among people with heart failure in the cross-sectional sample (PDF, 118K)

Figure 24. Distribution of KCCQ: Symptom Burden scores among people with heart failure in the cross-sectional sample (PDF, 143K)

Figure 25. Distribution of KCCQ: Self-Efficacy scores among people with heart failure in the cross-sectional sample (PDF, 118K)

Figure 26. Distribution of KCCQ: Quality of Life scores among people with heart failure in the cross-sectional sample (PDF, 118K)

Figure 27. Distribution of KCCQ: Social Limitation scores among people with heart failure in the cross-sectional sample (PDF, 118K)

Appendix D.

Histograms of Osteoarthritis of the Knee Measure Score Distribution for the Cross-sectional Sample (PDF, 422K)

Figure 1. Distribution of PROMIS Physical Function domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 201K)

Figure 2. Distribution of PROMIS Sleep Disturbance domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 118K)

Figure 3. Distribution of PROMIS Depression domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 4. Distribution of PROMIS Anxiety domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 5. Distribution of PROMIS Life Satisfaction domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 6. Distribution of PROMIS Satisfaction with Social Roles and Activities domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 182K)

Figure 7. Distribution of PROMIS Ability to Participate in Social Roles and Activities domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 8. Distribution of PROMIS Social Isolation domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 118K)

Figure 9. Distribution of PROMIS Independence domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 10. Distribution of PROMIS Fatigue domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 175K)

Figure 11. Distribution of PROMIS Pain Interference domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 175K)

Figure 12. Distribution of PROMIS Symptoms domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 13. Distribution of PROMIS Pain Intensity domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 14. Distribution of PROMIS Anger domain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 143K)

Figure 15. Distribution of PROMIS Global Health: Physical scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 118K)

Figure 16. Distribution of PROMIS Global Health: Mental scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 143K)

Figure 17. Distribution of KOOS: Symptoms scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 18. Distribution of KOOS: Pain scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 182K)

Figure 19. Distribution of KOOS: Activities of Daily Living scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 174K)

Figure 20. Distribution of KOOS: Sports and Recreation scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 119K)

Figure 21. Distribution of KOOS: Quality of Life scores among people with osteoarthritis of the knee in the cross-sectional sample (PDF, 143K)

Appendix L.

Histograms of Heart Failure Measure Score Distributions at Baseline for the Longitudinal Sample (PDF, 455K)

Figure 1. Distribution of PROMIS Dyspnea domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 118K)

Figure 2. Distribution of PROMIS Physical Function domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 3. Distribution of PROMIS Fatigue domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 175K)

Figure 4. Distribution of PROMIS Sleep Disturbance domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 5. Distribution of PROMIS Depression domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 6. Distribution of PROMIS Anxiety domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 7. Distribution of PROMIS Life Satisfaction domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 8. Distribution of PROMIS Satisfaction with Social Roles and Activities domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 9. Distribution of PROMIS Ability to Participate in Social Roles and Activities domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 175K)

Figure 10. Distribution of PROMIS Social Isolation domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 11. Distribution of PROMIS Illness Burden domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 12. Distribution of PROMIS Independence domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 118K)

Figure 13. Distribution of PROMIS Pain Interference domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 14. Distribution of PROMIS Symptoms domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 15. Distribution of PROMIS Health Behavior Outcomes domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 16. Distribution of PROMIS Cognitive Function domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 17. Distribution of PROMIS Cognitive Abilities domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 18. Distribution of PROMIS Anger domain scores at baseline among patients with heart failure in the longitudinal sample (PDF, 143K)

Figure 19. Distribution of PROMIS Global Health: Physical scores at baseline among patients with heart failure in the longitudinal sample (PDF, 195K)

Figure 20. Distribution of PROMIS Global Health: Mental scores at baseline among patients with heart failure in the longitudinal sample (PDF, 118K)

Figure 21. Distribution of KCCQ: Physical Limitation scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 22. Distribution of KCCQ: Symptom Stability scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 23. Distribution of KCCQ: Symptom Frequency scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 24. Distribution of KCCQ: Symptom Burden scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 25. Distribution of KCCQ: Self-Efficacy scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 26. Distribution of KCCQ: Quality of Life scores at baseline among patients with heart failure in the longitudinal sample (PDF, 174K)

Figure 27. Distribution of KCCQ: Social Limitation scores at baseline among patients with heart failure in the longitudinal sample (PDF, 118K)

Appendix M.

Histograms of Osteoarthritis of the Knee Measure Score Distributions at Baseline for the Longitudinal Sample (PDF, 407K)

Figure 1. Distribution of PROMIS Physical Function domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 119K)

Figure 2. Distribution of PROMIS Sleep Disturbance domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 164K)

Figure 3. Distribution of PROMIS Depression domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 118K)

Figure 4. Distribution of PROMIS Anxiety domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 5. Distribution of PROMIS Life Satisfaction domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 6. Distribution of PROMIS Satisfaction with Social Roles and Activities domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 7. Distribution of PROMIS Ability to Participate in Social Roles and Activities domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 8. Distribution of PROMIS Social Isolation domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 9. Distribution of PROMIS Independence domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 10. Distribution of PROMIS Fatigue domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 11. Distribution of PROMIS Pain Interference domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 175K)

Figure 12. Distribution of PROMIS Symptoms domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 13. Distribution of PROMIS Pain Intensity domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 14. Distribution of PROMIS Anger domain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 144K)

Figure 15. Distribution of PROMIS Global Health: Physical scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 16. Distribution of PROMIS Global Health: Mental scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 17. Distribution of KOOS: Symptoms scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 18. Distribution of KOOS: Pain scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 175K)

Figure 19. Distribution of KOOS: Activities of Daily Living scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 20. Distribution of KOOS: Sports and Recreation scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 174K)

Figure 21. Distribution of KOOS: Quality of Life scores at baseline among patients with osteoarthritis of the knee in the longitudinal sample (PDF, 118K)

Original Project Title: Facilitating Patient Reported Outcome Measurement for Key Conditions
PCORI ID: ME-1303-5928

Suggested citation:

Schifferdecker KE, Carluzzo KL, Kallen MA, et al. (2019). Adding Disease-Specific Concerns to Patient-Reported Outcome Measures. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/4.2019.ME.13035928

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

Copyright © 2019. Dartmouth College. All Rights Reserved.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK601019PMID: 38416860DOI: 10.25302/4.2019.ME.13035928

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (5.2M)

Other titles in this collection

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...