U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Davies SM, Geppert J, McClellan M, et al. Refinement of the HCUP Quality Indicators. Rockville (MD): Agency for Healthcare Research and Quality (US); 2001 May. (Technical Reviews, No. 4.)

Cover of Refinement of the HCUP Quality Indicators

Refinement of the HCUP Quality Indicators.

Show details

3 Results

The results chapter is divided into 5 sections. Each indicator is assigned an identification number used throughout the results section and in the appendices. The sections are designed to offer the results in a comprehensive and clear manner. Each section is designed to present the results in a unique way, and thus some redundancy is inevitable.

1. Summary of Evidence for Indicators (Section 3.A.) presents the literature review findings and results of the empirical evaluation for the final indicators that comprise the HCUP II QIs. This section is intended for the reader who does not wish to delve into the details of the evaluation (which are presented in section 5). There are 4 main subsections:

  • Initial Empirical Evaluation (Section 3.A.1.) provides a description of the winnowing of over 200 indicators down to 45 recommended indicators.
  • Summary of Literature Review and Empirical Evaluation (Section 3.A.2) provides a text summary of the literature review and empirical analyses findings for provider-level and then area-level indicators. The text summaries refer to the tables contained in the following two sub-sections.
  • Summary Tables by Indicator (Section 3.A.3, Tables 9-13) provides supporting evidence from the literature review and empirical analysis along with suggestions for use, by indicator, organized by indicator type.
  • Summary Tables of Evidence by Empirical Test (Section 3.A.4, Tables 14-25) provides a succinct synopsis of all empirical findings. The results are organized by the tests for the volume indicators, and of precision, minimum bias, and construct validity.

Sections 2-4 present results for specific aspects of the review and evaluation:

2. Results of Semi-Structured Interviews (Section 3.B.) provides information from a variety of organizations on the application of indicators and risk adjustment methods as well as practical suggestions for refinement of the HCUP I QIs.

3. Review of Risk Adjustment Approaches (Section 3.C.) provides the rationale for the use of APR-DRGs as a risk adjustment approach for this version of the QIs.

4. Evidence from Literature by Indicator Type (Section 3.D.) provides a broad overview of quality indicators in general. This is a literature review of the use and validity for each of the major subgroups of indicators: volume, utilization, ambulatory sensitive conditions, and mortality. Because indicators within a subgroup often have similar limitations and because subgroups of indicators, instead of specific individual indicators, are often reported in the literature, this section offers a general overview of the literature findings.

The final section provides the details of the review, organized by indicator:

5. Detailed Evidence by Indicator (Section 3.E) provides a comprehensive presentation of the literature review and the empirical evaluation for each indicator that is included in the HCUP II QIs. This section consists of detailed textual summaries of the evidence for each indicator. Each indicator summary begins with a definition of the indicator evaluated, followed by the findings of the literature review and the empirical analyses. At the end of each write-up, a discussion paragraph integrates the empirical and literature findings and provides recommendations for use.

3.A. Summary of Evidence for Indicators

This section summarizes the results on the recommended quality indicators, including both the literature review and the empirical analyses. These results are presented in four sections, described in the introduction to this chapter.

3.A.1. Initial Empirical Evaluation

As described in Section 2.C. "Literature Review Methods" over 200 indicators were identified from the literature, databases of indicators, and through personal contact. These indicators are listed in Appendix 5, and represent indicators definable using administrative data, and applicable to a large share of providers or areas (i.e. does not include highly specialized clinical areas, such as burn-units). Seventy-one indicators were selected for initial empirical evaluation, according to the criteria outlined in Section 2.C., including all non-complication HCUP I QIs. In general these indicators related to a relatively large number of patients and or hospitals, had adequate face validity, and provided a comprehensive examination of multiple aspects of health care. Reasons for selection and exclusion are also listed in Appendix 5. The initial empirical evaluation evaluated 71 indicators using tests of precision. Those indicators with low precision (less than 1.5% provider variation share or 0.01% for area variation share) were excluded from further evaluation, since the interpretation of indicators without adequate precision is unclear. Table 4 lists the 23 non-complications related HCUP I QIs tested, and their inclusion status in HCUP II. Table 5 shows the additional non-HCUP I indicators removed from further consideration exclusively because of the precision test. Several other indicators initially tested were length of stay measures, and were excluded based on input from experts regarding reservations about using them for quality assessments.

Table 4. List of HCUP I indicators, and inclusion status in HCUP II.

Table

Table 4. List of HCUP I indicators, and inclusion status in HCUP II.

Table 5. List of other indicators, tested but not recommended due to low precision.

Table

Table 5. List of other indicators, tested but not recommended due to low precision.

3.A.2. Summary of Results

Our evidence report recommends 25 provider-level quality indicators and 20 area-level indicators for use (see Tables 6 and 7) (See Appendix 6 for full definitions of recommended indicators). Provider-level quality indicators measure hospital or other provider quality, and are defined with a provider-level denominator. Area-level quality indicators most likely measure health system quality within an area and are defined with a population denominator. While none of these indicators is without its limitations, a considerable literature in most cases coupled with evidence of satisfactory empirical performance suggests that the recommended indicators may be useful additions to the "toolkit" of a broad range of clinical quality improvement professionals, health care managers, health policymakers, as well as researchers. Each of these indicators, when used with the appropriate caveats in mind, is appropriate for screening for quality problems - as a first step in identifying potential quality problems. The accompanying Tables review the results of the detailed evaluations of all the indicators. The overall indicator summary tables (Tables 9, 10, 13 for the provider indicators and Tables 11-12 for the area indicators) include major findings from the existing research literature on each indicator, as well as a consistent set of empirical evaluations of indicator performance. These empirical analyses can be replicated or extended using the software that accompanies this report. Based on the literature review and empirical evaluation, the evidence summary tables (Tables 9-13) also outlines some specific guidance for using each indicator as part of a program to improve quality.

Table 6. Provider indicator list.

Table

Table 6. Provider indicator list.

Table 7. Area indicator list.

Table

Table 7. Area indicator list.

Table 8. Example clinical groupings.

Table

Table 8. Example clinical groupings.

Provider Indicators

Provider indicators are constructed at the hospital level; they provide information related to the quality of care at individual hospitals. There are several types of indicators included:

  • Volume indicators for inpatient procedures where substantial evidence of an important volume-outcome relationship has been demonstrated. These indicators include abdominal aortic aneurysm repair volume, carotid endarterectomy volume, CABG volume, esophageal resection volume, pancreatic resection volume, pediatric heart surgery volume, and PTCA volume.
  • Utilization indicators for procedures whose use varies significantly across hospitals, and for which high (or low) rates of use are likely to represent inappropriate or inefficient delivery of care, leading to worse outcomes or higher costs or both. These indicators include cesarean section delivery rate, incidental appendectomy in the elderly rate, bi-lateral catheterization rate, successful vaginal birth after cesarean section rate, and laparoscopic cholecystectomy rate.
  • Mortality indicators for inpatient procedures for which mortality rates have been shown to vary substantially across institutions and for which evidence suggests that high mortality, at least in part, may be associated with deficiencies in the quality of care. These indicators include mortality for acute myocardial infarction, congestive heart failure, gastrointestinal hemorrhage, hip fracture, and stroke.
  • Mortality indicators for inpatient conditions for which mortality rates have also been shown to vary substantially across institutions, and for which evidence suggests that high mortality, at least in part, may be associated with deficiencies in the quality of care. These indicators include mortality after abdominal aortic aneurysm repair, CABG, craniotomy, esophageal resection, hip replacement, pancreatic resection, and pediatric heart surgery.

By level of evidence, Tables 9 (volume), 10 (utilization) and 13 (mortality) summarize our literature review and empirical evaluation of each of these indicators. While each provider-level indicator features distinctive issues, a number of common themes are apparent across many of these indicator types. We review these issues in the following subsections.

Volume Indicators: Reliably-Measured Quality "Proxies"

The volume indicators are somewhat different from the other provider-level indicators, in that they simply represent counts of admissions in which particular major procedures were performed, rather than more direct measures of performance. As such, they are not subject to many of the issues of noise and bias that interfere with the interpretation of other provider-level quality indicators, as discussed in the next subsection. The recommended volume indicators include those for which substantial research has demonstrated a significant relationship between hospital volume and outcomes, and for which a nontrivial number of procedures are performed by institutions that do not meet recommended volume thresholds. The weakest evidence linking volume and outcome exists for coronary artery bypass surgery, for which recent studies that included clinically detailed risk adjusters found a weak and statistically insignificant relationship. We retained this indicator in our recommendedset because of its historical use, because these recent studies came from areas with few low-volume hospitals, and because surgeon volume still appears to be a significant predictor of mortality. For the other volume indicators, the empirical evidence of a relationship is considerably stronger.

Administrative data, like the HCUP data and the state hospital discharge data from which they are derived, may be the best available source for accurate and comprehensive counts of major inpatient procedures performed by hospitals. For all of the recommended volume indicators, there is little evidence that procedures are miscoded or not reported. For one of the recommended volume indicators (PTCA), a small fraction of procedures (10% or less) are performed on an outpatient basis, and hence are omitted from HCUP, but these missing count data should only rarely influence conclusions about whether providers exceed recommended volume thresholds. For two of the recommended volume indicators (esophageal resection and pancreatic resection in cancer patients), the procedures were so infrequent that counts from a single year may not provide reliable measures of hospital volume and experience. However, counts over several years can provide a quite reliable measure of hospital volume, even in these cases. Thus, the HCUP II volume indicators can provide valuable information for health policymakers, purchasers, consumers, and others on whether particular hospitals meet recommended volume thresholds.

Tables 14-17, based on results from the analysis of 1995-97 HCUP data that are presented in detail in the report Appendix 7, summarizes the distribution of procedure volumes for the recommended indicators among HCUP hospitals. The Table shows that the vast majority of adult patients undergoing cardiac procedures - PTCA and CABG - were treated by hospitals that meet at least a "lower" recommended volume threshold. However, patients undergoing carotid endarterectomy and abdominal aortic aneurysm repair were considerably less likely to be treated by hospitals that met recommended volume thresholds. The relatively few patients undergoing esophageal or pancreatic resection were unlikely to be treated by hospitals that met recommended volume thresholds. Finally, a significant fraction of pediatric heart surgery patients were treated by hospitals that did not meet the volume thresholds. However, as we note in our review of this indicator, pediatric heart surgery consists of a heterogeneous set of procedures; thus, it is possible that hospitals with low overall surgery volume had relatively high volumes for particular types of specialized surgery. None of these results showed substantial differences between 1995 and 1997.

Table 18 summarizes the correlations among the hospital volume measures, and the correlations between volume measures and the associated hospital procedure mortality measures (described in the subsection, Mortality Indicators). Not surprisingly, hospital volumes for CABG, PTCA, carotid endarterectomy, and abdominal aortic aneurysm repair are strongly correlated. The weaker correlations between the volumes of these procedures and pediatric heart surgery volume reflects the fact that some pediatric surgery centers tend to specialize in pediatric care, and so have lower or no volume of adult surgical procedures. The hospitals specializing in cancer surgery also tend to be somewhat different from those performing substantial cardiovascular surgery. In general, higher-volume hospitals tend to have lower inpatient mortality rates for that procedure, as well as lower inpatient mortality for related procedures.

Table 18. Correlations between volume and mortality indicatorsa.

Table

Table 18. Correlations between volume and mortality indicatorsa.

These empirical results confirm that hospital volume is an important correlate of quality of care. However, these results as well as the other analyses and prior studies summarized in our detailed review of each indicator also demonstrate that volume is at best a quite noisy reflection of quality. While hospital volume has significant explanatory power, the relationship is not precise; in practical terms, there appear to be many high-quality procedures performed by low-volume institutions, and conversely many low-quality procedures performed by high-volume institutions. Causes of the weak relationship between volume and quality include the possible importance of surgeon volume (not captured presently in HCUP data), differences in the severity and complexity of cases treated, and many other factors. Moreover, use of volume as a quality indicator may lead to undesirable hospital responses, such as the performance of more procedures on patients with mild disease or who are otherwise inappropriate candidates. Thus, while volume is a useful proxy for quality, it is important to consider more direct measures of hospital quality, to help determine whether a high-volume hospital actually provides high-quality care, and whether a low-volume hospital provides low-quality care.

Utilization Indicators (provider-level only)

Precision

All of the recommended indicators show a large amount of variation across hospitals, suggesting that important opportunities for improving quality of care exist. In addition, with the exception of incidental appendectomy, all of these indicators involve common procedures (i.e., large numerators and denominators), and so they are all, at least, relatively precisely measured. "Smoothing" and related methods to account for random noise are helpful, but except for incidental appendectomy, such methods have a relatively modest impact on the measured performance of medium-to-high volume hospitals; Methods to account for random noise are less critical for the utilization measures than for the hospital mortality measures.

Minimum bias

While risk adjustment (where feasible) has some impact on measured performance for some of the measures, none of the measures appear to be highly sensitive to risk adjustment based on age, sex, and APR-DRGs (where applicable). Nevertheless, because differences in patient characteristics that are not captured through such risk-adjustment may influence whether or not the procedure is appropriate, the utilization measures may still be biased. In addition, all cesarean deliveries are classified to APR-DRG 540, and all cholecystectomies are classified as either laparoscopic (APR-DRG 263) or non-laparoscopic (APR-DRG 262). APR-DRGs cannot be used to adjust for patients' underlying risk of cesarean delivery, vaginal birth after cesarean delivery, or laparoscopic cholecystectomy, because they are assigned based on utilization of the procedure. More careful risk-adjustment, even based on the limited clinical information available from HCUP data, could identify and remove additional bias.

Construct validity

Though the ideal rate for each of the indicators has not been established, substantial evidence suggests that the rates observed at many hospitals are very likely to be inappropriate. This is particularly true for bilateral catheterization and incidental appendectomy, which have very low optimal rates at virtually all hospitals, although both of these procedures are still commonly performed. Multiple studies suggest that high-quality centers can safely reduce the utilization of these procedures to relatively low rates. In addition, empirical analysis suggests that several of these measures may be correlated with each other. For example, hospitals with higher cesarean delivery rates tend to have lower rates of vaginal birth after cesarean and higher rates of incidental appendectomy. However, it is not clear whether utilization indicators are useful "proxy" indicators for other aspects of hospital quality.

Fosters true quality improvement

In summary, compared to the mortality indicators, the utilization indicators require somewhat less sophisticated statistical methods and provide relatively clear evidence of likely quality differences across hospitals. Although there are very few indications for incidental appendectomy and bilateral catheterization, the "right" utilization rate for these indicators is generally not known. Thus, it may be more useful to identify very high or low outliers than to emphasize numerical rate differences. In addition, for the elective non-obstetrical procedures, use of these quality indicators may create a perverse incentive to increase the denominator for the utilization measures. For example, hospitals could increase their laparoscopic cholecystectomy rate by performing more cholecystectomies on low-risk, low-benefit patients, rather than by shifting patients from open to laparoscopic surgery. Simultaneous evaluation of the area rate of the denominator procedures (i.e., any type of cholecystectomy) can provide some evidence on this question. Finally, use of these indicators could induce under-reporting.

Further investigation

These indicators are likely to be most useful as a "screen" for further evaluations, using supplemental data to determine whether utilization is truly inappropriate. Incidental appendectomy is generally inappropriate, but review of a few of the cases performed might identify valid exceptions. For all of the remaining procedures, detailed clinical guidelines on appropriate use have been developed and could be applied to determine whether hospitals that appear to have high rates are in fact treating a significant number of inappropriate cases.

Mortality Indicators

HCUP data can be used to construct a number of indicators for inpatient mortality after major procedures and for common medical conditions leading to hospitalization. Because patient characteristics beyond the control of medical providers are the primary determinants of mortality, and because mortality is a relatively infrequent outcome for most conditions, the problems of noise and bias are more substantial concerns with the mortality indicators than with the other recommended hospital quality indicators. On the other hand, the recommended mortality indicators demonstrate generally large differences across hospitals that do not appear to be due to random chance or to differences in comorbid diseases reported in hospital discharge data. Because mortality is a very important outcome, these indicators can potentially be used as part of a careful and thorough quality improvement effort to reduce inpatient mortality.

Precision

For most mortality indicators, a substantial part of the apparent variation in mortality rates across hospitals can be attributed to unsystematic unobservable characteristics, or noise. Some mortality indicators (e.g., hip fracture) have relatively less noise than others (e.g., hip replacement and GI hemorrhage). Three indicators, AAA repair, esophageal resection, and pancreatic resection appear particularly noisy. Nonetheless for all of the mortality indicators, except hip replacement, methods to account for unsystematic variation in developing a "best estimate" of hospital performance, such as the smoothing methods applied in this report, have a substantial impact on the indicators and are strongly recommended for all mortality indicators to avoid misidentifying "outlier" providers and to develop a reasonably good forecast of a provider's mortality rate in the future.

Minimum bias

Because patient characteristics are relatively important determinants of mortality, these measures also have a large potential for bias, due to unobserved differences in case mix. Biases may also arise in all of the measures because of differences in hospital discharge and transfer practices, because HCUP currently does not permit information on deaths after discharge or certain transfers to be included. As Table 13 and our associated literature reviews note, several studies have provided evidence on the importance of such biases for most of the recommended indicators. In most cases, risk adjustment using secondary diagnosis information in HCUP data can help address the problem of case-mix differences. Thus, such risk adjustment should be performed when feasible. For a few indicators, evidence suggests that the particular choice of risk adjustment method (e.g., APR-DRG or Comprehensive Severity Index (CSI), as described in more detail in section 2.D. Risk Adjustment Methods) may affect rankings. However, these previous studies of the effects of alternative risk adjustment systems generally did not account for random variation in hospital level performance. As a result, quite noisy measures of hospital mortality were compared, so it is perhaps not surprising that the results varied by risk adjustment methodology. HCUP users should look for future studies that compare the effects of different risk adjustment systems on smoothed hospital mortality indicators. In the meantime, however, users should realize that measured performance for some indicators might depend on the risk adjustment method that is applied.

Table 13. Summary evidence table for mortality indicators.

Table

Table 13. Summary evidence table for mortality indicators.

Another potential source of bias for some of the mortality indicators is differences in hospital outpatient (emergency room) treatment quality for particular conditions. Better hospitals may be more capable of treating milder cases on an outpatient basis, so that the patients actually admitted with the condition have relatively high severity and comorbidity. Whereas all of the procedures included as mortality quality indicators are generally performed on an inpatient basis, many patients with relatively uncomplicated GI hemorrhage, congestive heart failure, and pneumonia, or possibly stroke may be managed effectively as outpatients in some centers.

Construct validity

All of these considerations suggest that hospital mortality indicators can be used most effectively as quality screens, rather than as fully validated, reliable measures of hospital performance. To the extent possible, they should be considered in conjunction with other sources of information on hospital quality, to identify opportunities for quality improvement. Some of the other sources of information can be derived from HCUP data: mortality indicators can be considered in conjunction with hospital volume indicators and patient safety indicators (e.g., long hospital stays) for the same condition or procedure, as well as mortality and other quality indicators for related procedures (e.g., after accounting for random noise, CABG and AAA mortality rates tend to be correlated with each other). A consistent pattern across many of these indicators could provide more evidence on a potential underlying quality problem. Although some hospitals excel in specific areas of care andnot others, HCUP data can be used to identify relationships among quality indicators, when they exist.

Fosters true quality improvement

One potential concern about the use of mortality indicators is that they may provide incentives to avoid cases that are more difficult and to treat patients with milder illnesses. Although little direct evidence exists on this question, some surveys have suggested, following published reports on CABG mortality rates in Pennsylvania and New York, that it is a potential problem. However, few studies have investigated whether improvements in quality of care resulting from use of an indicator outweigh these potential adverse consequences.

An additional concern is that use of in-hospital mortality indicators without accompanying information on post-hospitalization mortality may lead to the premature discharge of patients to die at elsewhere. While it has been shown for a few of the mortality indicators that 30-day mortality differs from in-hospital mortality, there is no evidence that use of in-hospital mortality indicators has actually created this perverse incentive.

Further investigation

Because of the important limitations of mortality measures based on discharge abstracts, these measures can benefit significantly from use in conjunction with other sources of data on hospital quality. One potential source of additional information is medical chart review or linkage with hospital-based clinical data systems (e.g., laboratory test results), which would allow better adjustment for severity and comorbidity. Even if a comprehensive clinical risk adjustment program is not feasible, such patient record data can be used in a limited way to determine whether the average case mix of patients treated by different hospitals differs in a manner that could explain performance on HCUP II measures. Record reviews may also be helpful for determining weaknesses in processes of care that are associated with lower mortality. Finally, because many of the indicators are significantly related to each other, information on more general aspects of hospital quality (e.g., staffing ratios, procedures to avoid medication errors) may be useful to examine in hospitals with unusual performance in many dimensions. Biases related to potentially incomplete patient follow-up can be addressed through record linkages. Better information on post-hospitalization complications can be obtained by linking hospital records longitudinally or by surveying patients, and better information on post-admission mortality can be obtained by linking death index (mortality) records. Finally, analyses of hospital outpatient data (particularly ambulatory surgery and emergency room data) in conjunction with inpatient data can help to determine whether variations in risk-adjusted mortality reflect differences in outpatient practices. The literature includes examples of all of these approaches.

Summary of Empirical Evaluation of Precision and Bias for Provider indicators

Tables 19!-- END HREF --> and 21 summarize the empirical evaluations of precision (including variation across hospitals) and bias for the hospital utilization and mortality indicators. Empirical findings on the performance of specific indicators are presented in detail in Appendix 7 and are summarized in Tables 19 and 21.

Table 19. Precision - Provider Indicatorsa,b.

Table

Table 19. Precision - Provider Indicatorsa,b.

Table 21. Minimum Bias - Provider Indicatorsa,b.

Table

Table 21. Minimum Bias - Provider Indicatorsa,b.

Area Indicators

Our evidence report includes a set of quality indicators constructed at the area level. Area level indicators are constructed with a population denominator. Two types of indicators are included:

  • Utilization rate indicators for procedures whose use has been shown to vary widely across relatively similar geographic areas, and which have also been shown to include substantial inappropriate and/or equivocal utilization. These recommended utilization indicators include hysterectomy and laminectomy. Two other utilization indicators (rates of coronary artery bypass surgery and PTCA) were also included so that users may identify differences and track changes in utilization that correlate with HCUP II volume and mortality indicators.
  • Ambulatory care sensitive condition (ACSC) indicators involving admissions for diagnoses that could have been prevented or ameliorated with currently recommended outpatient care, according to recent evidence from population-based studies. The recommended ACSC indicators include dehydration, bacterial pneumonia, urinary infection, perforated appendix, angina, adult asthma, chronic obstructive pulmonary disease, congestive heart failure, diabetes (short and long term complications, uncontrolled diabetes, and lower extremity amputation), hypertension, low birth weight, pediatric asthma and pediatric gastroenteritis.

Versions of some of these indicators were previously recommended as HCUP I QIs. However, their construction differs in HCUP II, in that the denominator for these indicators is constructed at the area level. For most of the indicators, the denominator is the age- and gender-adjusted population rate of hospitalization with the procedure or diagnosis. (There are two exceptions: for perforated appendix and low birth weight rate, we use as denominators all hospitalized cases of appendicitis and all births, respectively. In these cases, the indicators are constructed at the area level with denominators consisting of age- and gender- standardized rates for the population of appendectomies in the area or sex-standardized rates for the population of births in the area.) In the previous version of HCUP (HCUP I), the denominator was some set of discharges or all discharges at each hospital. The hospital-based indicators were criticized by many reviewers as misleading, since the hospital is not the best unit of analysis for measures that relate primarily to area health and health care. For example, a high-volume CABG hospital might have a very high CABG rate relative to other hospitals, but it could achieve very good outcomes (because of its specialization in CABG care) and actual area CABG rates might be low (because the hospital is a regional referral center). Thus, the indicator might suggest a quality problem when, in fact, none exists. By constructing ambulatory-care sensitive condition indicators at the area level, outliers for these measures will not simply be hospitals that specialize in procedures or that happen to care for a disproportionate share of patients receiving poor outpatient care.

Because HCUP data do not include specific information on patient residence (e.g., zip code), it is not possible to construct meaningful measures of area rates for very small areas. Nor is it possible to construct measures for the hospital referral regions used in the Dartmouth Atlas of Medical Care for fee-for-service Medicare patients, which also depend on information on zip code of residence. Rather, the smallest feasible area for analysis is the level that provides relatively modest "leakage" into or out of hospitals within the area. Because a significant share of patients at many hospitals are referred from hospitals outside the county, our indicators are constructed at the level of metropolitan statistical areas (MSA). At the MSA level, leakage still occurs (particularly patients from outlying rural areas receiving care hospitals in the MSA for intensive procedures), but it is relatively modest. The vast majority of patients treated in an MSA come from that MSA; and the vast majority of residents in the MSA receive treatment there. With more detailed information on patient residence, richer and more accurate area indicators could be constructed using the definitions applied in this report. Areas outside of any MSA were examined separately and on a county level.

Although these quality indicators are area-based, an important role remains for hospital-level measures of procedures or ambulatory care-sensitive admissions. If an area is found to have unusually high procedure rates, then the hospitals that contribute substantially to those rates, and more specifically the population served by those hospitals, represent a natural focus for efforts to understand why rates are high and possibly to reduce them. Similarly, if an area is found to have unusually high rates of potentially avoidable admissions, then the patient populations treated by hospitals with a relatively large share of these admissions might be a good focus for interventions to understand and reduce hospitalization rates.

Organized by evidence level, Tables 11-12 summarize our literature review and empirical analysis for each of the recommended area indicators. All of these indicators have been evaluated in at least a limited number of previous studies, and all performed relatively well in our empirical evaluations.

Utilization Indicators (area-level only)

Precision

All of the recommended area-level utilization indicators vary substantially across MSAs, by rates that seem far larger than can be explained by plausible differences in area health characteristics. For each of these indicators, there is some problem with distinguishing true rate differences from random variations. However, the noise problems are much smaller than exist with the hospital level measures, because the numerators and denominators are generally much larger. Thus, for the most part, it is possible to obtain reliable estimates of area rates without the use of relatively complex statistical methods to account for random noise.

Minimum bias

Many factors other than differences in hospital practices may influence procedure use, and vary systematically across areas. Whereas some of these factors seem appropriate to exclude from risk adjustment (i.e. differences in rates resulting from differences in insurance coverage), others seem inappropriate (i.e. differences in clinical risk factors and health behaviors). In general, age and sex adjustment has little impact on the area measures, and because existing measures are incomplete, it is not feasible to adjust for differences in health status. (It is possible to adjust area rates using area characteristics, e.g. poverty rate. However, such analyses require careful attention to methodological issues such as the so-called "ecological fallacy.") Many of these influences on health care use are also associated in a complex way with differences in socioeconomic status. We note where previous studies permit some conclusions about whether these factors have relatively modest or substantial effects on area rates. In general, while a range of environmental, socioeconomic, and other factors have been shown to influence area rates, a substantial part of the variation in rates across areas is unexplained by all of these factors. One additional source of bias for the PTCA rate measure is the performance of some procedures on an outpatient basis. However, the share of such procedures is relatively small (less than 10%).

Construct validity

For most of the area indicators, previous studies have documented moderate or high utilization of the procedure for indications judged by experts to be inappropriate or of questionable value. One exception is CABG rates, which vary substantially across US areas and are much higher than in many other countries, but (at least in the areas studied so far) do not seem to be associated with substantial rates of inappropriate use (though rates of use for indications of uncertain value are high) and which may be underused in some patient subgroups. Most studies have not linked high procedure rates to higher rates of inappropriate use. For this reason, the area utilization rates are proxy measures for inappropriate utilization only. Since the rate of inappropriate use appears not to vary with procedure rates, areas with higher rates have a higher raw volume of inappropriate procedures, than areas with lower rates.

Fosters true quality improvement

The studies on appropriateness of these procedures suggest that lowering procedure rates would primarily reduce inappropriate and low-value uses, leading to important benefits for the efficiency of medical care and possibly for patient outcomes. However, little direct evidence exists to date on this hypothesis. To achieve procedure reductions, it also seems likely that the high area rate must be parsed into contributions by specific hospitals. This is straightforward with HCUP data: the numerator of the area rate measures can be divided up into shares attributable to specific hospitals.

Further investigation

For most of the area utilization indicators, detailed clinical guidelines exist for judging the appropriateness of procedure use in specific cases. Such guidelines can be applied to sample cases from hospitals that make large contributions to high area rates, and to help identify specific opportunities for lowering rates, thereby providing more convincing evidence that no clinical harm would result. Information on patient residence could be used to identify and exclude patients from outside the area, and could also be used to provide a "proxy" (based on zip code) of patient income and other characteristics of the hospital area that may influence rates.

ACSC Indicators

To a large extent, the same issues described for the area utilization indicators also apply to the ACSC indicators.

Precision

All of the ACSC indicators are measured relatively precisely, and all involve serious complications that are at least somewhat common. However, methods to eliminate the effects of random noise on estimated rates are likely to be helpful, especially for the measures that are somewhat less common than the procedure utilization measures described above.

Minimum bias

All of the factors that may influence area utilization rates also influence area ACSC rates. In addition, some of the rates are substantially influenced by environmental conditions (e.g., COPD and pediatric asthma). For some indicators, differences in socioeconomic status have been shown to explain a substantial part - perhaps most - of the variation in ACSC rates across areas. However, this relationship is often used in the literature as proof of the validity of these conditions. The complexity of the relationship between SES and ACSC rates, make it difficult to delineate how much of the observed relationships are due to true access to care difficulties in potentially underserved populations, or from other patient characteristics, unrelated to quality of care, that vary systematically by SES. Finally, for some of the indicators, patient preferences and hospital capabilities for inpatient or outpatient care might explain variations in hospitalization rates.

Construct validity

In general, studies have shown that better outpatient care (including, in some cases, adherence to specific evidence-based treatment guidelines) can reduce patient complication rates, including the complications leading to ACSC admissions. For ruptured appendix, hospital care in emergency departments (specifically, time to treatment) also appears to influence the rate of rupture; thus, this ACSC indicator has a component related to hospital rather than ambulatory care. Empirically, most of the ACSC rates are correlated with each other, suggesting that common underlying factors influence many of the rates.

Fosters true quality improvement

Despite the relationships demonstrated at the patient level between higher-quality ambulatory care and lower rates of admission with subsequent complication, there is generally little evidence on whether improvements in access to high-quality care can reduce ACSC hospitalization rates in an area. Such relationships are difficult to elucidate, because of the many intervening factors that also affect ACSC rates as noted above. On the other hand, there is also little evidence that use of these quality indicators would have any undesirable effects on hospital activities. Using HCUP data to identify the hospital patient populations making the largest contributions to area rates might also provide some insights into causes and potential responses to high ACSC rates, as those populations within those hospital service areas could be targeted for study or for intervention.

Further investigation

Unfortunately, for many of the ACSC indicators, the available literature on causes of area rate differences is limited. Nonetheless, some further investigations are likely to provide useful insights. The vast majority of patients hospitalized with a subset of the ACSCs are elderly (e.g., pneumonia, dehydration). For these conditions, complementary analyses of data from the Medicare program, which include longitudinal records and information on outpatient care, can provide further insights regarding high area rates and whether or not they are associated with less use of outpatient care. Even though HCUP data are less detailed in some respects, they are much more complete in terms of providing information on Medicare beneficiaries enrolled in managed care plans (historically, managed care plans in Medicare have not reported inpatient or outpatient encounter data). Thus, Medicare and HCUP data may be complementary, especially in areas with high rates of managed care enrollment among the elderly. As with the area utilization indicators, additional information on patient residence can support analyses of the importance of "leakage" in and out of MSAs on apparent rate differences, and on the effects of socioeconomic and other area characteristics on rates. In addition, information on outpatient care for ACSCs can provide evidence on whether some of the admissions might have been avoidable (e.g., ruptured appendix, if patients with appendicitis are treated and released and then return with persistent symptoms) and on whether hospitals and areas differ in their ability to manage some of the ACSCs on an outpatient basis.

Summary of Empirical Evaluation of Precision and Bias for Area Indicators

Tables 20 and 22 summarize the empirical evaluations of precision (including variation across areas) and bias (with respect to age and sex differences) for the recommended area utilization rate and ACSC indicators. Empirical findings on the performance of specific indicators are presented in detail in Appendix 7 and are summarized in Tables 11-12.

Table 20. Precision - Area Indicatorsa,b.

Table

Table 20. Precision - Area Indicatorsa,b.

Table 22. Minimum Bias - Area Indicatorsa,b.

Table

Table 22. Minimum Bias - Area Indicatorsa,b.

Grouping of Indicators

All indicators in isolation provide a unidimensional and fairly limited picture of quality. As the results of this report indicate, many factors besides quality may contribute to the performance of a quality indicator alone, including random variation. However, consistent good or bad performance on several related indicators is more convincing of a true underlying difference in performance, as it is more unlikely that such a pattern could arise from random or unsystematic events. Looking at groups of indicators together, therefore, is likely to provide a more complete picture of quality. While the HCUP indicators were not developed as "scales" of quality, meaning that one could calculate an overall quality score by plugging the indicators in an algorithm, they do group together in both by aspects of care and clinical domains. Generally, users would be most interested in either area performance or provider performance. However, in some cases, such as comprehensive medical groups, or policy regarding both hospital and area level quality, use of area and provider indicators together may be helpful. Possible indicator groupings are discussed below. Both provider and area level measures may be included, but users should keep in mind that these two types of indicators were designed to measure slightly different aspects of care.

Indicators are already grouped according to indicator type. This grouping can be used for more than simple organization. Indicators in similar groupings are often to designed to measure the same aspect of care, whether it be the outpatient access to care for ACSC conditions, or the healthcare utilization rates of utilization indicators. Thus examining indicators of like type may be useful in gaining a more complete picture of quality.

In the case of the ACSC indicators, many of the indicators were actually developed as part of a set, designed to comprehensively examine access to care. These indicators have been most often validated as a set, the evidence for which is set forth in the detailed write-up on ACSC indicators found in the last section of the results chapter of this report. In this case, use together as a set may be particularly ideal, since the evidence for some of these indicators alone is unclear.

In contrast, many of the mortality indicators have not been developed together, although they have been studied together. Using factor analysis, we found that medical mortality indicators in particular tend to be related to each other, meaning that providers with high rates for one condition also tended to have high rates for another condition. The pattern for post-procedural mortality is less clear, though some procedures tend to be positively related to each other, though to a lesser extent than medical mortality measures. Since different surgeons, and in some cases different surgical teams perform different operations, these indicators may vary more independently. Nevertheless, examining post-surgical indicators together may aid in identifying problems in overall surgical quality that is not procedure dependent, if such a problem existed.

The remaining utilization indicators all examine inappropriate use of procedures. Using factor analysis we found that area utilization indicators were related to each other, meaning areas with high utilization rates for one procedure tended to have high rates for the other three procedures. The provider level indicators however tended to follow another pattern. Utilization of technical procedures of bilateral catheterization and laparoscopic cholecystectomy tended to be negatively correlated, as the quality relationship would suggest, but not as strongly related to more established procedures like cesarean section, VBAC or incidental appendectomy. The latter three indicators were predictably related to each other.

Further grouping of indicators can be based on clinical domain. Though we did not perform formal analyses of the construct validity of these groupings, they have adequate face validity in that similar physicians and health care teams tend to provide care for the conditions in each grouping. For instance cardiologist or cardiovascular surgeons tend to provide care for all of the cardiovascular indicators. Example groupings are suggested below, although further research on the validity of these grouping is needed.

3.A.3. Summary Tables of Evidence by Indicator

These tables are organized by indicator type, and either provider or area-level designation. All tables summarize the empirical and literature based evidence regarding the indicator. In addition, we make recommendations for using the indicator. One set of recommendations (HCUP users) are those that may be followed using the HCUP II software and administrative data like the HCUP database (e.g. use with other HCUP II QIs). Another set of recommendations (Future Investigations) suggests additional examinations, such as chart review or additional data collection, which may aid in the interpretation of the indicator.

1) Table 9. Summary evidence table for provider-level volume indicators. This table includes the following indicators (indicators #1 - #7): AAA repair volume, carotid endarterectomy volume, CABG volume, esophageal resection volume, pancreatic resection volume, pediatric heart surgery volume, PTCA volume.

Table 9. Summary evidence table for volume indicators.

Table

Table 9. Summary evidence table for volume indicators.

2) Table 10. Summary evidence table for provider-level utilization indicators. This table includes the following indicators (indicators #8 - #12): Cesarean section delivery rate, incidental appendectomy among elderly rate, bilateral catheterization rate, VBAC rate, laparoscopic cholecystectomy rate.

Table 10. Summary evidence table for provider level utilization indicators.

Table

Table 10. Summary evidence table for provider level utilization indicators.

3) Table 11. Summary evidence table for area-level utilization indicators. This table includes the following indicators (indicators #13 - #16): CABG rate, Hysterectomy rate, laminectomy rate, PTCA rate.

Table 11. Summary evidence table for area-level utilization indicators.

Table

Table 11. Summary evidence table for area-level utilization indicators.

4) Table 12. Summary evidence table for area-level ACSC indicators. This table includes the following indicators (indicators #17 - #31): Dehydration admission rate, bacterial pneumonia admission rate, urinary infection admission rate, perforated appendix rate, angina admission rate, adult asthma admission rate, COPD admission rate, CHF admission rate, diabetes short-term complication admission rate, uncontrolled diabetes admission rate, diabetes long-term complication admission rate, hypertension admission rate, low birth-weight rate, pediatric asthma admission rate, and pediatric gastroenteritis admission rate.

Table 12. Summary evidence table for area-level ACSC indicators.

Table

Table 12. Summary evidence table for area-level ACSC indicators.

5) Table 13. Summary evidence table for provider-level mortality indicators. This table includes the following mortality indicators (indicators #32 - #44): AMI, CHF, GI hemorrhage, hip fracture, pneumonia, acute stroke, AAA repair, CABG, craniotomy, esophageal resection, hip replacement, pancreatic resection and pediatric heart surgery.

3.A.4. Summary Tables of Evidence by Empirical Tests

Volume Indicator Results

Statistical tests for volume indicators differ from utilization, admission and mortality rates, as precision of the indicator is not a primary concern. The following section summarizes the results of each statistic derived for the volume indicators. Some of the statistics were not used in the "rating" of an indicator, and are simply provided for context; this is noted where applicable. This section is organized into three tables, each summarizing one of the three areas of empirical performance for volume indicators, distribution, share, and persistence (see section 2.E. "Empirical Methods" for a full explanation of these methods). A total of 906 hospitals were included in this evaluation. Two thresholds were used for each indicator, with the exception of pediatric heart surgery, where only one threshold has been recommended. The thresholds were derived using thresholds reported in the literature and represent less and more stringent criteria for high volume.

Table 15 outlines the volume distribution for each procedure (listed in the left hand column). It includes 6 statistics:

Table 15. Volume distribution by year.

Table

Table 15. Volume distribution by year.

Percent of hospitals performing one or more procedures annually

No procedure was performed by all hospitals. Since many hospitals do not perform the procedure in question, this indicator does not apply to those specified hospitals. This statistic gives the percentage of hospitals that perform this procedure and thus are impacted by this indicator. If only a small number of providers perform a procedure, the face validity of that indicator, as it pertains to the importance of that procedure, is called into question. However, other aspects of face validity and construct validity are important as well, and were weighted more strongly when evaluating the indicator.

HIGH: More than 50%
MODERATE: 25% - 50%
LOW: Less than 25%

Mean number of procedures performed annually

This statistic is the simple mean of procedures performed annually by hospitals that perform at least one procedure. If the mean number of procedures is low, the precision for that indicator may be affected. This statistic was not weighted, but rather is reported for context.

Tests 3-6 provide additional information on the distribution of procedures across hospitals. Examining the distribution highlights cases where very few providers perform a majority of procedures, or other special cases that would result in a skewed distribution.

Standard deviation (SD). Standard deviation of the number of procedures performed annually. This statistic includes only hospitals performing the procedure.

50th percentile. Median of the number of procedures performed annually. This statistic includes only hospitals performing the procedure.

90th percentile of the number of procedures performed annually. This statistic includes only hospitals performing the procedure.

95th percentile of the number of procedures performed annually. This statistic includes only hospitals performing the procedure.

Table 16 summarizes the percentage of procedures and volume in two statistics. Both statistics are reported for two thresholds. Threshold 1 is the less stringent threshold, and threshold 2 is the more stringent threshold. Thresholds are reported at the beginning of this section, in table 14.

Table 16. Percentage of procedures and volume.

Table

Table 16. Percentage of procedures and volume.

Table 14. Thresholds for indicators.

Table

Table 14. Thresholds for indicators.

Percentage of procedures at high volume hospitals

This statistic refers to the number of procedures performed at hospitals that qualify as high volume (according to the thresholds) as a percentage of all procedures. If most procedures are already performed at high volume centers, there is little room for improvement (few procedures could be shifted to high volume hospitals). On the other hand, if most procedures are performed at low volume hospitals, many procedures could be shifted to high volume hospitals, potentially resulting in better outcomes. Therefore, a designation of "low" implies the most opportunity for improvement.

Threshold 1:
HIGH: More than 90%
MODERATE: 50%-90%
LOW: Less than 50%

Percentage of hospitals at high volume

This statistic refers to the number of hospitals performing more than the threshold number of procedures as a percentage of all hospitals performing at least one procedure. This statistic was not weighted in the evaluation, but rather is provided for context in interpreting the first statistic.

Table 17 demonstrates the persistence of the high volume status of hospitals from year to year. Reported is the percentage of hospitals designated as high volume (according to the thresholds) in 1995 or 1996, that remain high volume the following year. If high volume status is not persistent, it is likely that simply examining one year of data may not be accurate. Indicators with low persistence are likely to have lower precision.

Table 17. Year to year persistence of high volume status.

Table

Table 17. Year to year persistence of high volume status.

HIGH: More than 90%
MODERATE: 75% - 90%
LOW: Less than 75%

Table 18 displays the Pearson r correlations between volume and mortality indicators significant at the p<.001. Where previous hypotheses suggest a potential relationship, significant correlation at the p<.05 level are reported. As the validity of volume indicators is based on the volume-outcome relationship, we tested the correlation between volume and mortality. As we did not use the same methods (i.e. clinical risk adjustment) as most of the studies cited in the literature reviews, results from this test are not considered conclusive. Rather, the literature review informs the construct validity for these indicators.

Precision Results

Key to Empirical Evaluation of QI Precision

(1) Standard Deviation:

The standard deviation of the "signal" in each QI is a measure of the extent to which performance on the QI varies systematically across hospitals or areas. It is a best estimate of the variation in QI performance that appears to be systematic - truly associated with the hospital or area, and not the result of random variations in patients or environmental conditions. Because many patient characteristics and random events may influence performance on a QI, only a fraction of the apparent variation in performance at the hospital or area level is likely to be systematic. Systematic variation will be larger to the extent that: sample sizes are larger (allowing more precise estimation of the true effect); patient-associated variation is smaller (allowing more precise estimates for a given sample size); and other factors discussed in more detail in the text. Larger variation across hospitals and areas in the signal suggests that there is truly more variation across hospitals or areas to explain and, possibly, to improve. In contrast, if the standard deviation is small, the QI's performance suggests there is little to be gained by improving the performance of lower-ranked hospitals or areas to the levels achieved by the higher-ranked hospitals or areas. The standard deviations in the table are reported as absolute rates. For example, for VBAC, the mean rate across all providers and years is 32.3% - just under one-third of patients reported to have a previous C-section underwent vaginal delivery. The systematic variation across hospitals in VBAC performance is large - the standard deviation of the estimated performance of hospitals around the overall average is 17.5%. In other words, the precision of the VBAC QI is sufficient to conclude that many hospitals truly had VBAC rates close to 0, whereas many others truly had VBAC rates over 50%. QIs for hospitals and areas were grouped into three broad performance categories based on the absolute magnitude of the true or signal standard deviation (in percentage points):

Provider QI:
VERY HIGH - 8.0%+
HIGH - 3.0% to 7.9%
MODERATE - Less than 3.0%

Area QI:
VERY HIGH - 0.15%+
HIGH - 0.05% to 0.15%
MODERATE - Less than 0.05%

(2) Share:

The share reported in the Tables is the share of signal variance (that is, the standard deviation squared) in the total variance associated with the QI. In general, the variance in a quality indicator can be divided as follows:

Total Variance = Patient-Level Variance + Provider-Level Variance.

Typically, much variation in the numerators for the QIs (e.g., whether or not a vaginal birth occurred for a patient with a prior C-section) is associated with patient-level factors, and appears to have nothing whatsoever to do with providers or areas. In turn, the apparent Provider-Level Variance can be divided into two components:

Provider-Level Variance = Signal Variance + Noise Variance.

That is, some of the variance in a QI rate for a particular provider (adjusted or unadjusted) is a reflection of random chance: in a given year, some hospitals or areas will by chance have more or fewer occurrences of the numerator related to a particular QI. The remaining variance, and only the remaining variance, is attributable to systematic or true differences in the QI across providers. The share of signal variance will be larger, to the extent that: within-hospital patient variance is small; sample sizes are larger; and true differences in performance across hospitals are larger. Unlike the Standard Deviation reported in the previous column - see (1) above - the share of signal variance is a relative measure: relative to the total variation (patient plus provider) in the QI, what fraction appears to be associated with systematic provider differences? Other things equal, higher shares suggest differences across hospitals or areas that will be easier to sort out from all of the other "random" influences on QI performance. QIs for hospitals and areas were grouped into three broad performance categories in terms of the signal share of total variation:

Provider QI:
VERY HIGH - 6.0%+
HIGH - 1.0% to 5.9%
MODERATE - Less than 1.0%

Area QI:
VERY HIGH - 0.10%+
HIGH - 0.03% to 0.10%
MODERATE - Less than 0.03%

(3) Ratio:

The ratio reports the Signal-to-Noise ratio in the provider- or area- level variation in the QI measure (see Note 2 for a discussion of the sources of variation in a performance measure). This measure answers the question: of the apparent variation in QIs across providers, what fraction appears to be truly related to systematic differences across providers, and not random variations ("noise") from year to year? As such, the measure is somewhat misnamed. Its definition is:

Signal-to-Noise Ratio = (Provider-Level Signal Variation)/(Provider-Level Variation)

where (as noted above) the provider-level variation includes both signal variation and random variations in the QI measures across providers ("noise"). In other words, it is the "signal" to "signal plus noise" ratio. In general, if a QI's signal-to-noise ratio is high, then it is likely that apparent variations in performance across providers are not the result of random chance, and careful attention to distinguishing true from random variation across providers will have little impact on the measured performance of a provider or area. QIs for hospitals and areas were grouped into three broad performance categories in terms of the signal share of provider-level variation:

VERY HIGH - 90.0%
HIGH - 70.0% to 89.9%
MODERATE - 40.0% to 70.0%
LOW - Less than 40%

(4) R-square:

The above ratio (3) reports the univariate smoothed ratio. However, multivariate techniques may improve the ability to recover signal. An R-square that is higher than the ratio demonstrates the ability of multivariate techniques to extract more signal than univariate techniques. The same rating criterion used for the ratio is used for the R-square measure.

How can these measures be used to guide QI initiatives?

1. Signal Standard Deviation: If there is little true variation across providers or areas, then there is little to be gained by comparing the performance of providers or areas as a basis for understanding differences in performance and possibly improving poor performance. Without further refinement, computing the QI is unlikely to have much policy value. Many potential QIs show very little true variation, and so were dropped from further analysis. Virtually all of the indicators included in our recommended list had a standard deviation of at least 1 percentage point in performance, that is, the top 2.5% of hospitals or areas performed at least 4 percentage points above the lowest 2.5% of hospitals or areas.

2. Share of Signal in Total Variation: If the share of total variation is very small, it is likely to be more difficult as a practical matter to identify why hospitals or areas perform well or poorly. In other words, because so little of the variation in the QI at the patient level is explained, there will probably be many potential explanations for differences in performance that may be quite costly to pursue. As a cautionary note, it is important to recognize that the bulk (over 90%) of the variation in the numerators for virtually all of the QIs we reviewed is not systematic provider-level variance. Thus, identifying the reason(s) why a particular hospital or area performed well or poorly on the QI is generally likely to be difficult. Our review of the literature on the QIs provides some further guidance for promising specific explanations to pursue for particular QIs. If determining the cause of systematic differences appears to be difficult, the QI may not be worth pursuing.

3. Signal to Noise Ratio in Provider Variance: Even if a QI performs reasonably well in terms of signal standard deviation and share of total variation, it may still have a low share of true provider-level differences in the apparent differences across providers. In this case, careful statistical analysis is required before a provider or area that appears to perform well or poorly can be labeled as such. As our empirical evaluation describes, this further analysis might consist of evaluating measures over multiple years, evaluating measures that are related to each other, and applying "Bayesian" techniques to account for the fact that the apparent differences across providers are "noisy." In this table, we report the signal variation that all such methods can recover; the signal variation in a single, univariate analysis is likely to be smaller.

Minimum Bias Results

Key to Empirical Evaluation of QI Bias

To provide empirical evidence on the sensitivity of our QIs to potential biases from differences in patient severity, we compared unadjusted performance measures for specific hospitals with performance measures that were adjusted for age, gender, and where possible, patient severity. We used the APR-DRG System (3M Version 12.0, severity and risk-of-mortality subclasses) for adjustment (See literature review for discussion). For a few measures, no APR-DRG categories were available, so that unadjusted measures were compared to age-sex adjusted measures. We used a range of bias performance measures, most of which have been applied in previous studies. We note that these comparisons are based entirely on discharge data. In general, we expect performance measures that are more sensitive to risk adjustment using discharge data also to be more sensitive to risk adjustment using more complete clinical data, though the differences between the adjusted and unadjusted measures may be larger in absolute magnitude than the discharge data analysis would suggest. However, it is possible and in some cases likely that there is not a correlation between discharge and clinical-record adjustment. Specific cases where previous studies suggest a greater need for clinical risk adjustment are discussed in our literature reviews of specific indicators. The table reports results for our multivariate signal extraction models, which generally yielded the most precise estimates of QI performance measures for areas and hospitals. Results for other statistical methods were largely similar. For all five performance measures, described below, we classified performance into three groups.

(1) Rank Correlation Coefficient: This is the correlation coefficient of the rank of the area/hospital without and with risk adjustment.

VERY GOOD - 95.0%+
GOOD - 75.0% to 94.9%
FAIR - Less than 75.0%

(2) Average Absolute Value of Change Relative to Mean: This is a measure of the average absolute change (in percentage points) in the performance measure of the hospital or area with and without risk adjustment, normalized by the average value of the QI. Thus, it is also a relative measure that theoretically can range from 0 (no change) to a much higher value.

VERY GOOD - Less than 10.0%
GOOD - 10.0% to 20.0%
FAIR - 20.0%+

(3, 4) Percentage of High/Low Decile That Remains in High/Low Decile: These two measures report the percentage of hospitals or areas that are in the highest and lowest performance deciles without risk adjustment, which remain there with risk adjustment. Thus, a measure that is insensitive to risk adjustment should have rates of 100%, while measures where risk adjustment affects the top- and bottom-performers substantially will have much lower rates. (Note: "Top 10%" for all measures refers not to the best performers, but to the hospitals or areas with the top (highest) rates. For some measures, such as VBAC, this may indicate better performance; for others, such as the mortality measures, it generally indicates worse performance.) Given the distributions of the indicators, it is much "easier" to move out of the highest decile relative to the lowest decile. For this reason different rating criteria are used.

HIGH DECILE:
VERY GOOD - 85.0%+
GOOD - 55.0% to 84.9%
FAIR - Less than 55.0%

LOW DECILE:
VERY GOOD - 95.0%+
GOOD - 85.0% to 94.9%
FAIR - Less than 85.0%

(5) Percentage That Change More Than Two Deciles: This measure reports the percentage of hospitals whose relative rank changes by a substantial distance - more than 20% - with and without risk adjustment. Ideally, risk adjustment would not have a substantial effect for very many hospitals.

VERY GOOD - Less than 5.0%
GOOD - 5.0% to 14.9%
FAIR - 15.0%+

Construct Validity Results

While most evidence on the construct validity of our indicators is based on our review of the literature, our empirical analysis allows us to determine the extent to which hospital and area performance measures are related to each other. Consistency of hospital performance across measures can provide some empirical evidence on construct validity: one might expect that hospitals with better underlying processes of care will tend to perform better on a range of quality measures. It is not necessarily the case that hospitals performing better in some aspects of care will perform better on others, and there is evidence that good performance in some aspects of care is not related to good performance in others. However, many of these prior studies used performance measures that included substantial measurement error, and possibly other biases; other things equal, such "noise" will cause relationships among measures to appear attenuated. Thus, these studies may have underestimated the extent of relationships across measures and the potential for achieving more general insights about quality from the joint analysis of specific quality indicators.

To quantify the extent to which performance on specific quality indicators reflected general patterns, we conducted factor analyses using as inputs our best-guess estimates (those obtained through the MSX analysis) of hospital and area performance on the full set of quality indicators that had adequate precision. Though esophageal and pancreatic resection mortality both had adequate precision, very few hospitals actually perform these procedures regularly, resulting in a non-normal distribution. Thus, these indicators were excluded, as they violated the normality assumption of factor analysis.

Factor analysis seeks to summarize results for a large number of specific measures using a smaller number of summary indicators, or factors. It does so by obtaining estimates of the "loadings" of each specific quality measure on each summary factor, in order to explain the maximum variation in all of the quality measures possible with a limited number of factors. Quality measures closely related to each other will have high "loadings" on a particular factor. Quality measures with significantly positive loadings on a particular factor tend to occur together; conversely, a quality measure with a significant negative loading tends to be inversely related to quality measures that have positive loadings on a factor. To minimize attenuation bias caused by imprecision in the indicators, we conducted this analysis using the "best estimates" of the hospital quality indicators obtained through the multivariate signal extraction (MSX) method.

We report the results of three separate common factor analyses. The three factor analyses included 1) hospital analysis with all hospital-level quality indicators; 2) area factor analysis with all of the area-level indicators related to the quality of ambulatory care; 3) area analysis with all of the area-level utilization indicators. Because of the large number of hospitals and areas, all with an inherently large sample size, all factor loadings were significant, even if the associations are not very substantial. We therefore examined the factors with relation to strength of factor loadings, directions of loadings, and the explanatory value of a factor for any given quality indicator. For each quality indicator, we identified the factor that explained the largest proportion of the variance. In addition to the factor analyses we examined correlation (Pearson's r) matrixes for each group (provider level, ACSC, and utilization area indicators) to aid usin interpreting the factor analysis.

To highlight the most important findings from our factor analysis, we identified the most important explanatory factor for each quality indicator, that is, the factor that explained the largest share of the variation across areas or providers in the indicator. We report both factor loadings (using a varimax orthogonal rotation), which correspond to the correlation between that indicator and that factor, and standardized factor loadings, calculated by multiplying the factor loading by the standard deviation for that indicator and multiplying by 100. This results in a measure of percent change, meaning a 1 unit change in the factor would result in a 1 percentage point change in the indicator. We used a cutoff of a standardized loading of 0.7, which corresponds to an explanation of about half of the variance in the quality indicator, as our measure of "important." Tables 23-25 summarize the results.

Table 23. Summary of Factor Analysis Results, Provider-level indicatorsa.

Table

Table 23. Summary of Factor Analysis Results, Provider-level indicatorsa.

Table 24. Summary of Factor Analysis Results, Area-level ACSC indicatorsa.

Table

Table 24. Summary of Factor Analysis Results, Area-level ACSC indicatorsa.

Table 25. Summary of Factor Analysis Results, Area-level utilization indicators.

Table

Table 25. Summary of Factor Analysis Results, Area-level utilization indicators.

Provider-level analysis

Indicators related to both mortality and utilization were included in our hospital analysis. We identified three factors, using a criterion of Eigenvalue greater than 1 to extract factors. Many indicators loaded on more than one factor with substantial strength, confirming that the determination of hospital performance on the quality measures is a complex process that is not easily summarized by any one factor. Nonetheless, the three factors together provide a relatively complete summary of hospital performance.

Factor 1 explained most of the variance for most of the medical mortality measures (stroke, pneumonia, hip fracture, CHF, GI hemorrhage), although this factor explains less than 50% of the variance for the latter. As expected these indicators are all positively related to each other, meaning hospitals with high mortality rates for one condition, tend to have high mortality rates for the other conditions. Factor 1 explains 39% of the total common variance.

Factor 2 explained 19% of the total common variance. It explains most of the variance for two cardiac care indicators, mortality after CABG, and bilateral catheterization, as well as laparoscopic cholecystectomy. As expected the two cardiac care indicators are positively related to each other, while negatively related to laparoscopic cholecystectomy.

The final factor, factor 3, explains most of the variance for the obstetric utilization indicators, VBAC and C-section rates, and for the other non-technology based utilization factor, incidental appendectomy. As might be expected, VBAC and C-section have a significant negative relationship with each other: hospitals with high C-section rates tend to have low VBAC rates. Incidental appendectomy is positively related to cesarean section and negatively related to VBAC, as one would expect. Factor 6 explains 13% of the total common variance.

While the strongest relationships in each of the factors seem to follow a logical pattern based on quality relationships, there are also less important associations that do not follow this pattern. For instance, better performance in some quality measures on these factors was negatively correlated with other quality measures. There are many reasons for such imperfect relationships, such as negative correlations conditional on the other factors (e.g., the third factor should be considered in light of the first two and so forth), unexplained associations despite our best efforts to remove noise from the measures, and validity problems in the quality indicators (e.g., failure of the indicators to describe true quality differences accurately, due to possible residual biases as well as limited construct validity). However, overall, the most important relationships shown in the factor analyses are readily interpretable using clinical logic.

Several factors did not load strongly on any of the three factors, including all of the procedural mortality measures. These indicators did tend to be positively related to each other, as observed from the correlation matrix and factors. However, in general, the correlations were less strong than those for the medical mortality measures.

Given our time and resource constraints, we were unable to perform a full set of evaluations of our factor analysis results. However, these results suggest some important conclusions. Despite quite noisy data, most medical mortality indicators were positively related to each other, suggesting that some underlying "mortality factor" explains an important component of the variation in hospital performance on the quality measures.

Second, technology based utilization factors of laparoscopic cholecystectomy and bilateral catheterization were negatively related to each other, and CABG mortality is positively related to bilateral catheterization, suggesting a possible underlying "best practices" factor. Finally, the C-section and VBAC rates at a hospital are strongly negatively related; this would not be expected if variation in VBAC rates across hospitals primarily reflected reporting differences rather than true differences in how procedures were used. Incidental appendectomy is also related as would be expected to VBAC and C-section, if there were an underlying high quality "factor" as it relates to non-technology based practices.

Ambulatory care sensitive condition (ACSC) analysis

This analysis of the relationship among the area-level ACSC measures showed that, for the most part, rates of these conditions are positively correlated with each other. Two factors were extracted from this analysis, using the criterion Eigenvalue greater than 1.

Factor one explains a large proportion of the common variance, over 70%. It also explains the most variance for each of the factors except one, lower extremity amputation. All, but two indicators, low birth weight, and perforated appendectomy loads positively on this indicator. Both of these indicators are defined with a specified denominator (percentage of births, appendectomies, respectively), and appear to be different from the other ACSC indicators. These two indicators also load lower than the other indicators on this factor. Pediatric asthma also loads only slightly on this factor, using the raw varimax rotated factor loadings.

Factor two only explains 22% of the common variance. Lower extremity amputation loads the highest on this factor, with a factor loading of .90. Factor two also explains the most variance for this indicator. For the most part all of the other indicators load much weaker on factor two, especially compared to their loadings on factor one. Five exceptions are pediatric asthma, diabetes long term complications, diabetes short term complications, CHF, and low birth weight. These indicators are all positively related to each other. However, factor 1 explains more of the variance than does factor two for these indicator. Subsequent factors should be interpreted in light of previous factors, especially given the large difference in variance explained by the factors in this case.

Overall, this factor analysis shows that most ACSC indicators, particularly those included in nationally used sets (Billings and Weissman), are significantly and positively related to each other, as construct validity would suggest.

Area utilization indicator analysis

Again, one factor was sufficient to explain a large share of the total variation in our area utilization indicators (over 96%). All utilization indicators load on one factor in the same direction, as expected.

Overall, the factor analysis of area-level utilization rates shows that the rates are generally positively related to each other, suggesting that any of these indicators provides insights into the other utilization rates.

3.B. Overview of Semi-Structured Interviews

The semi-structured interviews were conducted to understand the implications on the HCUP QI refinement project of current quality measurement practices in the health care field. Those interviewed cited the use of several types of indicators as an alternative or in addition to the HCUP measures. Many organizations are using length of stay and cost proxies as indicators. Readmission rates, ambulatory care sensitive conditions and patient satisfaction surveys were also mentioned repeatedly. Several organizations utilize proprietary measures, such as the Maryland Quality Indicator Project. Most organizations utilized risk adjusted measures, often using 3M APR-DRGs as a risk adjustment method.

The HCUP I QIs are known by most of the organizations. They appear to be best known by state data groups and hospital associations, and least known by business coalitions. Several organizations are currently using the HCUP I QIs for a variety of purposes, including benchmarking, reference for decision-making, and community health reporting.

There were many suggestions for modifications and additions to the HCUP I QIs, detailed in Table 26. We briefly summarize those suggestions and issues raised that were most relevant to refinement of the current HCUP I QIs. Areas of concern regarding the current HCUP I QIs include the following:

  1. Indicator denominators: The indicator denominators sometimes consist of heterogeneous populations. Some organizations would like to see further modification of the denominators, while others wish to see population based denominators instead of admission based denominators.
  2. Low volume: Many of the events examined by the HCUP I QIs are rare events, such as the mortality measures. This is a particular problem for hospitals with low patient volume to begin with such as rural hospitals. Suggestions for this concern included creating optional aggregate or basket measures to increase volume, and including diagnoses with higher mortality rates such as hip pinnings or unruptured aneurysms.
  3. Validity: Some organizations expressed concern over the validity of the assumptions in the indicator definitions. For instance, some argue that one cannot assume that coded complications were not present on admission. Some organizations suggested that this concern be addressed through sixth digit coding or other additional data.
  4. Data integrity: There is some concern over coding bias in the discharge abstract. Some state data groups raised the issue of data audits as a way to ensure data integrity.

Table 26. Summary of Telephone Contacts August 1999 through March 2001.

Table

Table 26. Summary of Telephone Contacts August 1999 through March 2001.

One of the most common suggestions for refinement of the HCUP I QIs was risk adjustment. Organizations seem to have accepted risk adjustment as a standard in quality measurement. Many indicated that they preferred risk adjustment, as opposed to the narrowing of populations to create a more homogenous comparison group. Many organizations currently use the 3M APR-DRGs for risk adjustment. These organizations have indicated their preference to continue using risk adjustment.

In addition to refining the current HCUP I measures, organizations also indicated areas of potential novel indicators. For the most part many of these mirrored the expansion areas defined by AHRQ, including chronic disease and new technologies. Other clinical domains included adverse selection, mental health, and volume sensitive procedures.

Some suggested new indicators for the HCUP II QI set that would require the use of additional data, including at least the possibility of linking data with records such as census records, vital statistics and outpatient records. Several states are currently collecting additional data on ambulatory surgery, such that it may be possible to define indicators in this domain. The use of these data, while beyond the scope of the current project, would increase the flexibility of the HCUP II QIs.

Those interviewed emphasized the need for the HCUP II QIs to remain flexible. There was great interest in the use of HCUP II QIs in conjunction with other indicators. Organizations expressed that the most useful new developments would interact with the measures they currently use, allowing the organizations to create a more complete picture of quality. Contacts' suggestions included designing HCUP II to interact with patient satisfaction measures that are currently widely used, such as the FAACT surveys, or the Picker survey.

Further pointing to the need for flexibility is the wide range of audiences with possible interests in the HCUP II QIs. As mentioned earlier our contact list included a variety of organizations with a variety of goals. Some organizations, such as state data groups and hospital associations, want to benchmark hospitals, while others want to report on community health. Other organizations, such as business coalitions, seek to make purchasing decisions. These differing goals led to some differing opinions regarding indicators, such as the ambulatory care sensitive conditions. For instance, one organization valued the ambulatory care sensitive conditions for county-wide reporting of health. Other interviewees expressed that these indicators are not valid indicators of hospital quality since hospitals may not be responsible for these events. Thus, the range of uses and consequent opinions need to be considered when refining the HCUP QIs.

3.C. Review of Risk Adjustment Approach

This section presents the results of a literature review on potential risk adjustment systems and presents the evidence leading to the selection of APR-DRGs. All risk adjustment systems have potential drawbacks, and much has been written regarding risk adjustment in general. In the first section, this evidence is discussed to delineate the general caveats regarding risk adjustment. Further, any risk adjustment system used in conjunction with the HCUP QIs could only use the data available in the HCUP data set. This eliminated many risk adjustment systems that require additional clinical data. In the second section, we discuss the comparative evidence on systems that apply to ICD-9-CM coded administrative data. The final section discusses the potential of risk adjustment systems to fulfill the user defined criteria outlined in section 2.D. "Risk adjustment of HCUP Quality Indicators."

Literature Review of Risk Adjustment Systems

General Caveats About Risk-Adjustment Systems

For a risk factor to affect inter-hospital or -area comparisons of performance on a quality indicator, two conditions must be met. First, the likelihood that the patient will experience the outcome of interest must differ based on the presence or absence of the risk factor. Second, the distribution of the risk factor must differ systematically across the relevant patient populations treated by particular hospitals (if this condition does not hold, then better risk adjustment improves the precision of the quality indicator but will not change relative performance). 200 Analysis of the extent to which each of these conditions is true should ideally take place after accounting for random variations across hospitals or areas; otherwise, random variations in patient mix may obscure systematic differences across particular hospitals or areas. If both of these conditions hold, then differences across hospitals or areas in performance on an indicator will reflect true differences in provider quality plus these systematic patient differences. Thus, the systematic differences in patient disease severity lead to bias in the performance measure. Risk adjustment systems seek to account for measurable differences in patient risk factors in comparisons across areas and providers, to remove bias caused by these patient differences.

Several general comments apply to all the risk adjustment systems that might be used to adjust quality indicators. First, all available risk adjustment systems explain a relatively small share of the variation in virtually all hospital performance measures, whether measures of health outcomes and resource use. In other words, the proportion of outcomes at the individual level predicted correctly and the variation in outcomes "explained" by the risk factors included in the risk adjustment systems is much less than 100 percent. This is especially true for risk adjustment systems based on administrative data only, but is also true for systems based on much more detailed clinical data. As a result, risk adjustment systems generally do not help to improve the precision of comparisons across providers very much (by reducing the unexplained variance in outcomes across providers). If comparisons across providers are imprecise, the value of risk adjustment will tend to be lower, because it is difficult to detect significant differences in provider performance. In addition, while it is possible that the remaining unexplained variation is truly random, it is also possible that additional differences in patient characteristics across providers or areas are not captured by the risk adjustment system. Such residual differences in patient mix remain a source of bias even after risk adjustment. Because adjustment systems based on more detailed clinical data generally have more discriminatory power, they will generally lead to less biased comparisons. Conversely, inaccurate or inconsistent recording of risk factors that are included in a risk adjustment system can increase noise and bias in comparisons across providers. Because such anomalies are more likely to arise in risk adjustment systems with complex and burdensome reporting requirements, relying on risk factors that are clearly and consistently measurable may lead to less biased comparisons in practice.

Do risk adjustment systems based on careful reviews of medical charts and other supplemental, costly data sources result in substantially different conclusions about performance than systems based on administrative data? For some measures, collecting additional clinical information on risk factors not measured in discharge data is clearly valuable: it substantially alters the relative and absolute conclusions about provider performance.201, 202 For many other measures, however, the impact on comparisons across providers appears to be minimal.203, 204 Again, this may be because the additional variation explained by the additional risk factors is small, or because the additional risk factors do not differ much across the providers or areas being compared (e.g., only a small share of patients may have the comorbidity or severity factors). Because this evidence is to a large extent measure-specific, we generally discuss it in the context of the reviews of evidence on particular quality indicators.

Four caveats to these conclusions should be noted. First, models that incorporate laboratory data, which are increasingly available through hospital information systems, may perform nearly as well as models based on more detailed clinical data, at substantially lower cost. 19 Indeed, the correlation of hospital-level predicted values between models based on administrative plus laboratory data, and models based on full clinical data, has been reported to be as high as 0.98. 205 Second, confusion between comorbidities and complications may compromise the validity of risk-adjustment based on administrative data, relative to risk-adjustment based on detailed clinical data. 19 , 112 An indicator variable designating conditions that were present at admission (currently available only in California and New York) may greatly enhance the validity of risk-adjustment by allowing analysts to avoid misadjusting for conditions that actually represent adverse outcomes. 113 Third, the relative superiority of models based on clinical data depends somewhat on which specific clinical data elements are available and how they are used. The number of clinical data elements that actually contribute to explaining outcomes at the provider level may be relatively small; 201 206 failing to collect these key confounders may lead one to the incorrect conclusion that clinical data are unnecessary. 207 Finally, because much variation in performance is unexplained by even the most detailed risk adjustment methods, it is possible that all available risk adjustment methods fail to account fully for important, systematic differences in risk across hospitals.

Evidence regarding DRG based systems

Several systems use the type of discharge data readily available in HCUP. These include PIP-DCG, Medstat's Disease Staging, and DRG based systems such as HCFA DRGs, AP-DRGs and APR-DRGs. Although a substantial literature exists for Medstat's Disease Staging and although DxCG's PIP-DCG is being implemented to adjust payments to private plans in Medicare, none of the other systems are in use so universally with state data organizations as DRG based systems. Such official recognition, and lower cost of implementing a system that is already in use proved important to surveyed users of the HCUP I QIs. Resource constraints prohibited us from evaluating HCUP II QIs under each of these systems individually, thus our evaluation focused mainly on the DRG based systems. Future evaluations may consider other potential systems using the methods applied here.

Because other systems are also based on the DRG framework, it is worth considering what advantages the specific implementation of the APR-DRG system offers. We reviewed a recent comparison of DRG systems 208 in the context of potential HCUP II QIs. As mentioned earlier, there are two separate severity classifications in APR-DRG, one for severity of illness (resource use and complications) and one for risk of mortality. The other DRG systems considered were the Medicare DRGs, Refined DRGs (a HCFA-funded project at Yale University), All-Patient DRGs (jointly developed by the New York Department of Health and 3M), and Severity DRGs (another HCFA-sponsored refinement from the mid-1990s). The primary distinguishing characteristics of the APR-DRG system compared to these other DRG-based systems were: 1) the inclusion of severity categories that reflect not only presence of complications and co-morbidities, but also an assessment of the level of these conditions; 2) a severity classification for particular secondary diagnoses that varies by principal condition, age, and operative procedure; this approach recognizes a differential impact of secondary diagnoses by condition (i.e., primary diagnosis) and allows for some interactions among secondary diagnoses.

Literature review results

In empirical analyses (assumably as a result of these distinguishing characteristics discussed above), the APR-DRG system performed better than alternative DRG-based systems in explaining variation in both cost and mortality at the patient and provider levels. Specifically, in explaining patient level cost variation, the APR-DRG system produced a higher R-squared (explained more of the variance) than the alternative systems, primarily as a result of the additional explanatory power of the secondary diagnosis subgroups. The impact of the secondary diagnosis subgroups was greater for medical patients than surgical patients, especially for non-normal neonates, pediatric patients with chronic conditions, patients older than 65, and patients treated in children's hospitals and major teaching hospitals. The importance of hospital characteristics in explaining cost variation was less for the APR-DRG system than the other DRG systems, reflecting that the system captures the more complex case-mix at these providers.

Similarly, in explaining patient level mortality variation, the APR-DRG system produced a higher R-squared and c-statistic than the alternative systems, and resulted in better calibration for major diagnostic categories and age and sex subgroups. As in the cost analysis, the secondary diagnosis subgroups were primarily responsible for the increase. However, in contrast to the cost analysis, for mortality the impact of the secondary diagnosis subgroups was slightly greater for surgical patients than medical patients. The importance of hospital characteristics in explaining mortality variation was less for the APR-DRG system than the other systems, reflecting that the system captures the more complex case-mix at certain providers. The greatest impact of the DRG refinements occurred for large urban and teaching hospitals; the least impact occurred for small urban and rural hospitals. It is important to note that teaching hospitals may also have more severe case mixes or may systematically code more comorbidities than smaller hospitals.

These findings are largely consistent with our empirical analysis of the impact of the APR-DRG risk adjustment system on relative provider performance among the HCUP II QIs (see below). For example, the impact of APR-DRG risk adjustment was greater for surgical patients than for medical patients in provider level mortality measures, suggesting that the measures for procedures (which are largely elective) may be subject to more selection bias related to inpatient mortality than most of the "medical" measures. The impact tended to be greater for hospitals with measured performance in the highest and lowest decile. This was consistent with the Averill's finding that "trimming" of outlier cases tended to mute the impact of the APR-DRG severity classification system relative to the other DRG-based systems. 208 In other words, compared to the alternative DRG systems, the severity levels in the APR-DRG system do a better job of reflecting the distribution of patient severity at the extremes, where accounting for case-mix differences may be most important for the application of QIs.

A series of studies by Iezzoni and colleagues have also compared the performance of APR-DRGs (in a prior version) and other severity-adjustment systems based on administrative data (See Table 27). They reported that APR-DRGs performed better than competing systems (Disease Staging mortality probability, Patient Management Categories severity scale) in predicting inpatient mortality after coronary artery bypass graft surgery and stroke, but worse than Disease Staging in predicting inpatient mortality after acute myocardial infarction and pneumonia. Even in comparison with Refined DRGs, APR-DRGs had superior discrimination for some conditions (e.g., acute myocardial infarction, stroke), but inferior discrimination for other conditions (e.g., pneumonia, coronary artery bypass graft surgery).9-13, 17, 18, 204 These findings suggest that: (1) No severity-adjustment system, based on either administrative or clinical data, is clearly superior for all conditions and procedures. The optimal system for one condition may not be optimal for another. It is impossible to predict in advance, without empirical analysis, which system would be superior for any specific condition. (2) APR-DRGs perform about as well, and in some cases better, than competing severity-adjustment systems based on administrative data. (3) Unfortunately, different severity-adjustment systems frequently produce different impressions about severity-adjusted performance at the provider level, even when these systems have comparable discrimination.

Table 27. Risk adjustment statistics from the literature.

Table

Table 27. Risk adjustment statistics from the literature.

In addition, DRGs are one of the few risk adjustment systems based on administrative data that have been used in conjunction with "smoothing" methods - the statistical methods intended to remove the effects of random variations on measured differences in performance across providers and areas. Burgess et al. 209 illustrate the use of DRGs in this way. As in our analysis, the investigators applied age, sex, and DRG-based risk adjustment to hospital performance measures along with statistical methods for smoothing out random variations in measures (empirical Bayes methods). They found that hospitals with relatively imprecise performance measures (in particular, hospitals with small numbers of patients) were more likely to perform closer to the overall average performance level once the methods were applied. In other words, smoothing helped reinforce the effects of severity-adjustment in removing differences in patient mix that led smaller providers to appear to be outliers, when in fact they were not. We apply similar methods in our evaluation of the HCUP II QIs to obtain similar "smoothed" estimates of risk-adjusted provider performance.

Comparative performance of severity-adjustment systems based on administrative data, from Iezzoni et al.'s studies of patients with acute myocardial infarction (AMI), coronary artery bypass graft (CABG) surgery, pneumonia, and stroke in the 1992 MedisGroups Comparative Database.

Results of Evaluation of Risk Adjustment Systems for HCUP Analysis Pertaining to User Criteria

A limited number of risk adjustment systems met the criteria outlined above for our literature review, though the literature on the performance of many of these systems was rather limited. The additional criteria developed from interviews with potential HCUP users led us to adopt the 3M All-Patient Refined Diagnosis-Related Group (APR-DRG) system with severity and risk-of-mortality classifications for our evaluation of the potential HCUP II QIs and the construction of provider and area estimates. We discuss each criterion in turn.

"Open" system

The general DRG methodology and medical terminology is widely understood by hospital administrators and physicians, which facilitates acceptance. The APR-DRG logic is open code, which permits the manual coding and checking of individual medical charts if necessary, and does not depend on specific, "black box" computer software code for implementation. 3M provides extensive user documentation and support, and has working relationships in place with most current and potential HCUP II users.

Low additional cost

Data collection for the APR-DRG system is based on standard, widely used abstract systems for individual hospital discharges (specifically, the UB-92). Reliance on currently collected data and information technology systems minimizes additional IT resources and data collection costs. In addition, the system has already been applied to a relatively extensive variety of discharge data, allowing more complete construction of comparative provider and area norms. For many hospitals and states, the incremental data collection costs would be substantial for other established and well-validated severity-adjustment systems that require outpatient data, such as Ambulatory Care Groups (ACG), or medical chart abstraction, such as Atlas™ severity groups. There are many reasons to use these richer adjustment systems when suitable data are available. If the data content of HCUP improves, for example to include additional clinical detail, longitudinal records, and/or ambulatory care information, then more extensive risk adjustment would be feasible.

One risk adjustment system

The HCUP II QIs include mortality measures (e.g., AMI mortality rates), complication measures (e.g., diabetes long-term complication rates) and measures of appropriateness (e.g., VBAC rates). Ideally, one would want a risk-adjustment system that includes factors that are suitable for each of these multiple types of performance measures. The current APR-DRG system (along with some competing systems, such as Disease Staging) permits evaluation of both resource use (with a severity-of-illness classification) and outcomes (with a risk-of-mortality classification). In addition, the refinement incorporates a classification system for neonates developed by the National Association of Children's Hospitals and Related Institutions. Therefore the system is theoretically appropriate for the distinctive disease characteristics of patients from all age groups.

Official recognition

The APR-DRG system is used by almost all of the state data organizations and health agencies currently reporting comparative hospital data on resource use and mortality. In addition, the Medicare Payment Advisory Commission recently issued a report that advised the adoption of a refined DRG system to improve the accuracy of DRG payments in Medicare, using the APR-DRG system as an example (MEDPAC, 2000). Any such refinement would very likely to be modeled after the APR-DRG system, given the close relationship with the existing DRG system for Medicare reimbursement and the systems currently use by states and hospitals. HCFA contracts with 3M to maintain the "Grouper" software for the DRG system on an ongoing basis, which would facilitate continuing refinements to the APR-DRG system.

Conclusion

As a result of these findings, APR-DRGs were selected for two purposes. The first was to test the impact of risk adjustment on indicators. The second is that APR-DRGs will be implemented in the Version 2 HCUP II QI software, as an optional risk adjustment system. It should be emphasized that other risk-adjustment systems based on ICD-9-CM diagnosis and procedure codes, such as Disease Staging, may work as well or better than APR-DRGs for specific Quality Indicators. Our incorporation of APR-DRGs into the Version 2 software should not be construed as an unequivocal endorsement of this product. Indeed, customized risk-adjustment systems might be more effective than APR-DRGs or any off-the-shelf product, for the reasons outlined on page 148. However, it was beyond the scope of this contract to develop customized risk-adjustment systems for each Quality Indicator. Users may implement other severity stratification systems instead of APR-DRGs if they prefer.

Some indicators could not be risk adjusted using APR-DRGs. Several provider level indicators had denominators that corresponded with an APR-DRG category without severity classifications. Further, all the area indicators only include age-sex risk adjustment, as we did not have information on the APR-DRG distribution of the population (only those discharged from hospitals). It is unclear how risk adjustment with APR-DRGs in these cases (using the discharges in that area) should be interpreted.

3.D. General Evidence from the Literature by Indicator Type

The following section contains general evidence reported in the literature regarding indicator types or subgroups. In this report we organize the indicators according to the Donebedian paradigm (Structural, Process, and Outcome measures). Within each type are several subgroups (Volume-Outcome, Utilization, Avoidable hospitalizations or ACSC, and mortality). Indicators within a subgroup are closely related to each other and often have similar concerns, anticipated uses, etc. Often in the literature, indicator subtypes are discussed rather than specific indicators. For instance, and article may discuss mortality indicators, rather than mortality for a specific condition. For some subtypes, such as the ambulatory care sensitive conditions, measure sets have been developed and tested as a set, not as separate indicators. For these reasons, this section outlines the general evidence, concerns, and issues associated with indicators subtype. It is intended that these summaries provide context for the detailed indicator write-ups.

Structural Measures

Structure describes the setting in which care occurs and the capacity of that setting to produce quality.22, 50, 210 A substantial literature describes the association with quality of various structural aspects of hospital care - including teaching status, hospital ownership, availability of sophisticated technologies, physician participation on hospital committees, and qualifications of hospital personnel.203, 211-219 More recent candidate structural measures of hospital quality include patient volumes, the adoption of certain organizational models for inpatient care (e.g., closed ICUs 220 and stroke units221-223), and the presence of sophisticated clinical information systems.224-226 The information contained in the NIS supports consideration of 1) patient volumes, 2) teaching status, and 3) hospital ownership as potential quality indicators for HCUP II. Only patient volume, however, is discussed below, as the former two are not easily changed, and thus would not be good vehicles for quality improvement.

Volume-outcome relationships

An extensive literature indicates that hospitals and physicians with higher patient volumes achieve better outcomes across a broad range of conditions and procedures.51-53, 56, 58, 61, 64-66, 189, 190, 199, 227-238 The weight of evidence from this extensive literature suggests that, in the absence of accurate or reliable risk adjusted mortality rates, volume data may play important roles in informing patients and purchasers in provider selection.

HCUP I QIs already include in-hospital mortality for many conditions. Volume-based indicators may play a role for conditions for which in-hospital mortality differs significantly from long-term (out-of-hospital) mortality or where adequate risk adjustment requires more clinically detailed information than is available from discharge abstracts.

The literature suggests two basic models for volume-outcome relationships: the intuitive "practice makes perfect" explanation and the less intuitive "selective referral"52, 230, 239, 240 explanation, which means that high volume does not cause high quality, but that a provider is high volume because it is a high quality provider (i.e., patients are aware through informal knowledge that superior care available from certain providers, and they go to those providers). The observation that patients often seek their care from hospitals of superior quality even before explicit data such as mortality rates become publicly available 241 supports this explanation. Studies that have specifically addressed this issue suggest that both explanations play a role, with "practice makes perfect" predominating in emergent conditions, such as acute myocardial infarction230, 239 and unscheduled cardiac bypass surgery 228 , but selective referral accounting for the observed associations for elective bypass surgery230, 239 and hip replacement240, 242. The distinction is relevant to the anticipated response to using a volume-based indicator on quality improvement activities. If selective referral provides the dominant explanation for the relevant volume-outcome relationship, then a volume-based indicator is a 'signal' of a high or low quality provider, and the provider can only improve on the indicator by improving overall quality, communicating that improvement to potential patients, and wait for additional patients to arrive. On the other hand, if higher volume causes better outcomes, then it may not always be the case that directing additional patients to high volume providers is the right policy choice. One study of cardiac surgery indicated improved outcomes as a result of increased patient volumes 233 , but a study of pediatric trauma centers corroborated the concern that beyond a certain threshold, increases in volume may strain provider resources and worsen patient outcomes 243 . In reviewing the evidence supporting the volume-based indicators below, we consider the policy implications of diverting patients from local providers to high-volume centers 244 or the possibility of consequent decreases in patient satisfaction 245 the in section on fostering quality improvement (if evidence exists).

In general, one should consider indicators of volume, utilization, and outcomes together. It may be possible that high volume is required for technical skill on a procedure that overall is harmful to patients and prone to overuse. High volume providers may contribute to worse outcomes overall at the population level by exposing more patients to the adverse events and complications associated with the procedure 246 . Small-area research, which has the potential to distinguish between the outcome effects of high-volume and over utilization, sometimes suggests that the health impacts of variations in providers' propensity to subject patients to particular therapies (the utilization effect) dwarf effects of variations in the technical skill required to deliver these therapies (the volume effect), suggesting that utilization, not volume, is a more important determinant of quality.

Carotid endarterectomy (CEA), one of our selected indicators, illustrates the possibility. Publication of favorable clinical trials has led to a dramatic increase in the performance of CEA (see, for example, the North American Symptomatic Carotid Endarterectomy Trial (NASCET) 247 and the Asymptomatic Carotid Atherosclerosis Study (ACAS) 248 ).249-251 In this context, concern has arisen regarding the discrepancy between the efficacy (in a clinical trial) and effectiveness (in the real world) of CEA, especially for patients with asymptomatic carotid stenosis,252, 253 for which even the case for efficacy is not entirely convincing254, 255. Although community practice may replicate the efficacy results of randomized trials in some regions 256 (especially in the case of symptomatic stenosis 257 ), outcomes analyses for Medicare patients undergoing CEA indicate a substantial difference between efficacy and effectiveness.64, 250, 258 Mortality rates among Medicare patients were substantially higher than those reported in the ACAS, even at high volume centers that participated in the trial. 258 One explanation for the discrepancy between the clinical trial results and the observational studies might be the different characteristics of the patient population studied. For example, the ACAS trial excluded all patients over age 80, 248 while 15% of the Medicare patients undergoing CEA fell into this age range. 258 Patients in this age group are higher risk and experience 2-3 times the peri-operative mortality reported for younger patients.250, 258-260 In summary, high volume might be a quality indicator for only a specific population, and high utilization rates for a broader population may actually be an indicator of poor quality. Increasing volume by performing more procedures on patients with higher risks of poor outcomes may not be the desirable response (see, for example, a population analysis of outcomes for laminectomy documented worse outcomes for surgeons practicing in areas with a high population rate of procedure performance). 261

Process Measures

Utilization of Hospital Procedures

Utilization measures consist of both potentially overused and potentially underused medical procedures. Appendix 8 summarizes studies on appropriateness of procedure utilization. Appropriate use of medical procedures has drawn attention as the cost of health care has increased. Overuse of procedures may increase adverse outcomes by exposing patients to the risk of side effects or complications of treatment,262-264 and also may lead to unnecessary health care costs. 265 Underuse of procedures may lead to avoidable adverse outcomes.251, 266, 267 Many different data sources have been used to identify treatments with potential overuse or underuse problems; often, evidence comes from large variations in use across providers or geographic areas.263, 268 Most attention has been directed to overused surgical procedures,251, 262-264, 266, 269 because of the availability of reliable information on the performance of these procedures and their substantial costs and associated risks. In recent years, states, insurers, and health plans have all developed initiatives to understand and reduce inappropriate procedure use - including both steps to discourage use of costly "overused" procedures and encourage use of "underused" procedures. 265

Variations in procedure use across geographic areas and providers may have many explanations. Geographic variations may be due to differences in case-mix, access to care, or different medical "practice patterns" that reflect a combination of uncertainty about the effects of procedures in some patients and provider familiarity with particular methods of treatment.270, 271 Therefore geographic variation is an indication of possible overuse or underuse of a procedure, but may have other important explanations.

The utilization indicators identified in our review mostly consist of widely used surgical procedures, which at least at the area level are unlikely to vary substantially as a result of small sample sizes. (One exception is incidental appendectomy in the elderly; this procedure has not received much recent attention.)

Each of the procedures identified as potentially overused has an appropriate use in some patients. Appropriateness is influenced by a range of medical considerations including a patient's severity of disease, comorbidities, as well as provider factors such as provider experience. All of these factors may influence the procedure rate that is "appropriate" for a particular hospital. For instance, a hospital may draw more high risk births, leading to a higher cesarean section rate. Appropriate risk-adjustment and other techniques to remove the effects of such hospital-level differences may increase the value of information on hospital procedure rates.

Rates of procedures may also be influenced by non-medical patient characteristics, such as a patient's ability to cope with symptoms or physician practice style 272 . Patient preference may also play a large role in utilization rates. Some patients may be reluctant to undergo any surgical procedure. Other patients may prefer more intensive management, if there is some chance of eliminating disease symptoms.

For all of these reasons, the "ideal" rate for many procedures is not known with certainty. Although Healthy People 2010 developed target rates for two of the utilization measures - cesarean section and vaginal birth after cesarean section (VBAC) - debate surrounding the "correct rate" for these and other utilization measures still exists. 273 Much of the evidence on appropriate rates has been developed in a large series of studies by the RAND Corporation, using judgments by expert panels of physicians to rate the appropriateness of procedure use in an extensive set of possible indications, and then to apply these appropriateness criteria to detailed medical chart reviews for a sample of patients undergoing the procedure (to determine overuse) or potentially eligible for the procedure (to determine underuse).50, 269, 274 Studies assessing the validity of this method have noted moderate agreement about expert judgments between panel members, with kappa scores ranging from .51 for hysterectomy utilization to .83 for underuse of coronary revascularization. The authors concluded that the technique may be useful for evaluating the appropriateness of alternative observed rates of procedure use. 269 However, much of the clinical information included in such indications as those established by the RAND technique are not included in the hospital discharge abstract. In addition, in many cases, the overall rate of procedure use in an area or hospital is not strongly correlated with the share of patients judged to be undergoing the procedure for inappropriate reasons. 268

These consideration rates suggest that excessive reliance on quality indicators based on utilization rates calculated from hospital discharge abstracts may produce some undesirable effects. Use of potentially underused procedures may increase substantially, but the expanded use might involve patients who are not appropriate candidates.251, 266 In contrast, use of potentially overused procedures could fall among patients that meet appropriate indications for procedure use. Little evidence exists on the extent to which appropriate and inappropriate use may respond to utilization-based quality indicators, especially indicators based on hospital data.

Potentially Avoidable Hospital Admissions (ACSC)

Ambulatory care sensitive conditions are conditions for which hospitalizations may have been avoided through adequate primary care. In general, conditions identified as ambulatory care sensitive conditions have been identified through consensus processes involving panels of expert physicians, using a range of methodologies and decision criteria. Two sets of indicators are widely used. One developed by John Billings in conjunction with the United Hospital Fund of New York includes 28 ambulatory care sensitive conditions, identified by a panel of 6 physicians. 275 The other set was developed by Joel Weissman, and includes 12 avoidable admissions identified through review of the literature review and evaluation by a panel of physicians. 276 These measure sets have been further adapted for pediatric populations.277, 278 Many of the ambulatory care sensitive conditions have practice guidelines associated with them, including almost all of the chronic conditions and about half of acute medical or pediatric conditions. Most guidelines are consensus statements rather than evidence-based statements, though there are some notable exceptions such as angina. Conditions without explicit guidelines are associated, to varying extents, with ambulatory treatments that are widely viewed as important in preventing hospitalization.

Most of the ambulatory care sensitive conditions involve common types of hospital admissions. However, some are rare admissions, leading to heightened concern about the precision of these measures. For example, among adult medical conditions, admission for dental conditions is rare. Among pediatric conditions, admission for iron deficiency anemia, pediatric mastoiditis, and a primary diagnosis of otitis media (eliminating cases with myriagotomy or tube placement) are rare. Admission for immunization preventable diseases and pediatric urinary tract infections are also somewhat rare.

The extent to which hospitalizations for these conditions are attributable to quality problems and thus the "appropriate" rate of admission for these conditions is not well known. One cause of variation in admission rates is better or worse quality of ambulatory care. In the ambulatory care setting, physicians may fail to prescribe appropriate treatments, follow-up with patients, or offer advice on lifestyle changes. Physicians may fail to comply with guidelines, or may not even be aware of current treatment guidelines.

Another cause is differences in access to care. Several studies have shown that admissions for ambulatory care sensitive conditions are higher in areas with lower socioeconomic status.275, 279-283 Some investigators have developed evidence to support the view that patients in these areas may not have appropriate access to primary care, due to fewer physicians, lack of insurance, or other difficulties in obtaining care (lack of vacation time to attend doctors appointments, lack of child care, etc.).284-286 Other causes for variation in admission rates include higher rates of air pollution and other environmental exposures, and higher rates of malnutrition and unhealthy patient behaviors like smoking. One study has indicated that, at least for infants, higher rates of admissions for ambulatory care sensitive conditions is likely to be caused by factors other than genetic predisposition. 287

Patient health characteristics obviously may influence the rate of hospitalizations. Severity of illness may increase the likelihood that some outpatient treatments may fail. Further, in chronic diseases such as COPD or congestive heart failure, severity of the disease may progress over time. Thus, although good ambulatory care may reduce admissions, as time progresses a patient with one of these chronic illnesses would become more likely to be admitted due to progression of the underlying disease. Similarly, age is often correlated with increased admission rates. 280 Significant comorbid conditions may also complicate outpatient treatment. Some comorbid conditions may increase the likelihood of dangerous complications requiring hospital treatment; other comorbid conditions may be contraindications to the administration of preventive treatments. 280

Even with appropriate treatment administration and access to care, patient compliance may affect admission rates for ambulatory care sensitive conditions. Several studies have noted that patients frequently do not comply with physician advice on managing their condition. Outpatient therapy for many conditions can be time consuming, costly, or difficult to perform properly, and can have unpleasant side effects. Compliance is a critical and difficult issue in medicine, and may affect admission rates for ambulatory care sensitive conditions.

Finally, differences in threshold for admission may vary between geographic areas, physicians, and even between patients. Some physicians may regularly admit patients with less severe complications, rather than continue to try to manage the condition on an outpatient basis. Even within a physician's own practice, a physician may decide that the patient or caregiver could not handle outpatient treatment due to factors other than medical condition and choose to admit the patient. Home environment, responsibility level of the caregiver, incentives, 288 and other such issues often play into these decisions.

For all of these reasons, the extent to which the reporting of admission rates for ambulatory care sensitive conditions may lead to changes in ambulatory practices and admission rates is unknown. In any case, except for patients who are readmitted soon after a discharge, it is unlikely that the quality of hospital care is a significant determinant of admission rates for ambulatory care sensitive conditions. Rather, ambulatory care sensitive conditions are likely to measure the quality of the health care system as a whole, and especially the quality of primary care, for preventing medical complications. As a result, these measures are likely to be of the greatest value when calculated at the area level, and when used by public health groups, state data organizations, and other governmental agencies.

Outcome Measures

In-hospital Mortality

Mortality measures are some of the most widely used quality indicators, and as such they have been subject to the most debate. Mortality measures are used by varying degrees by Hospital Associations, proprietary measures sets (UHC, HCIA, US News and World Report) and have been used by federal agencies (HCFA) and other data projects (AMI mortality in California, CABG mortality in New York CCP).

Large variations in mortality rates have been noted, particularly for nonsurgical conditions.12, 289 This is likely the result of the "noise" in most mortality measures; mortality for most conditions is relatively rare. Similarly, it has been difficult to determine, with much certainty, whether hospitals with low mortality rates for one condition have low mortality rates for other conditions. One study documented a modest correlation for mortality rates for differing conditions. 37

The validity of mortality measures as indicators of quality is one of the most widely examined in the literature. Literature reviews have concluded that many mortality rate measures have limited positive predictive value because of noise; that is, highly adjusted mortality rates are likely to occur by chance, so that when they are re-measured in the next year, mortality may no longer appear high. In addition, factors other than hospital quality influence mortality, including patient comorbidities and disease severity. Many studies have concluded that high mortality rates for certain conditions often reflect greater patient disease severity or comorbidity rather than a true quality problem. On the other hand, for some measures, clinically detailed risk adjustment systems do not substantially alter the rates estimated from less detailed discharge data. For a more detailed discussion of risk adjustment issues, see Section 2.D.

Even when mortality measures remain high after risk adjustment, suggesting that a quality problem exists, it is sometimes difficult to identify process-of-care differences that could explain the higher mortality. On the other hand, high outliers may have higher rates of deaths judged "preventable" by experts, though judgments about preventability often have limited inter-rater reliability. Another study noted some correlation between risk-adjusted mortality rates and quality problems identified by peer review organizations. 147

In general, these studies suggest that noise due to small numbers of deaths is a significant problem for interpreting many mortality measures. Even if estimates are precise enough that noise is unlikely to be a cause of a high apparent mortality rate, differences in disease severity that are difficult to measure may, at least partially, explain the higher rate. However, a number of studies have found that potentially preventable adverse events are at least weakly associated with measured mortality rates.290, 291

The impact of mortality reports has also been debated. Consumers may be most interested in mortality measures, and many consumer-aimed quality reports, such as that developed by US News and World Report utilize mortality rates. Physicians appear to be the most skeptical of mortality rates. One study noted a decrease in risk-adjusted mortality rates for all conditions studied after the release of mortality reports in Ohio. 292 However, some have questioned whether this effect represented a true improvement in quality of care, or simply a shifting of deaths to other settings.

Conclusion

Each of the indicators evaluated in this report belong to one of the subgroups above, and as such, unless otherwise noted, the considerations and evidence discussed above in the corresponding write-up apply to that indicator. For instance, the considerations discussed in the mortality section apply to mortality after CABG and all mortality indicators. When examining the detailed indicator evaluations in the next sections, readers should keep in mind the general evidence discussed above.

3.E. Detailed Evidence by Indicator

Structural Measures

3.E.1 Volume Measures

INDICATOR 1: ABDOMINAL AORTIC ANEURYSM (AAA) REPAIR VOLUME

IndicatorAbdominal aortic aneurysm (AAA) repair, raw volume
Relationship to QualityBetter outcomes have been associated with higher volumes. Higher volumes thus represent better quality.
BenchmarkThreshold 1: 10 or more procedures per year 195
Threshold 2: 32 or more procedures per year196, 197

Method:

Quality MeasureProvider level AAA repair raw volume
Outcome of InterestDischarges with ICD-9 codes 38.34, 38.44, 38.64 in any procedure field and a diagnosis of AAA in any field.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskNot applicable
Evidence from the literature
Face validity

Procedure volume is a surrogate measure of quality; its face validity depends on whether a strong association with outcomes of care is both plausible and widely accepted in the professional community.

Abdominal aortic aneurysm repair requires technical proficiency with the use of complex equipment. Technical errors may lead to clinically significant complications, such as arrhythmias, acute myocardial infarction, colonic ischemia, and death. However, we are not aware of any consensus guidelines or recommendations regarding minimum procedure volume.

Precision

Abdominal aortic aneurysmectomy is not as common as the other cardiovascular procedures described in this report; only about 48,600 were performed in the USA in 1997 (1.8 per 10,000 persons). 293 Based on state all-payer databases, the mean annual frequency of abdominal aortic aneurysmectomies was 16.4-18.3 per hospital in Florida (unruptured only) in 1992-1996 294 , 8.4 per hospital in New York (unruptured only) in 1990-1995, 295 and 13.8 per hospital in Maryland in 1990-1995. 296

The number of abdominal aortic aneurysm resections is measured accurately with discharge data; in fact, discharge data are probably the best available source for hospital volume information. However, the relatively small number of procedures performed annually at most hospitals suggests that annual volume may be subject to considerable random variation.

Minimum bias

Volume measures are not subject to bias due to disease severity and comorbidities. For this reason, risk-adjustment is not appropriate. Although volume measures are theoretically subject to bias due to variation across hospitals in the use of outpatient surgery facilities, less than 1% of AAA repairs in 1996 were performed in ambulatory settings." 297

Construct validity

Volume is not a direct measure of the quality or outcomes of care. Although higher volumes have been repeatedly associated with better outcomes after abdominal aortic aneurysm resection, these findings may be limited by inadequate risk adjustment.

All but one of 15 studies published since 1985 demonstrated a significant association between either hospital or surgeon volume and mortality after abdominal aortic aneurysm repair. However, three sets of these studies (e.g., Hannan 1989 and 199253, 195; Kazmers, 1996 and Khuri, 1999196, 298; Pronovost, 1999 and Dardik 1999 and 199862, 197, 296) appear to include overlapping cases, which may exaggerate the consistency of their results. Two studies of intact aneurysms found that hospital and surgeon volume were significant independent predictors of inpatient mortality, adjusting for other hospital and patient characteristics,294, 296 whereas two studies (one of which included ruptured aneurysms) found significant effects only for hospital volume.195, 197 The three studies that focused exclusively on ruptured aneurysms reported a significant effect only for surgeon volume.62, 195, 299 Six studies that considered only hospital volume found lower mortality at high-volume hospitals;188, 196, 295, 300 two of these studies found a significant association among intact aneurysms but not among ruptured aneurysms.301, 302

The only completely negative study was Khuri and colleagues' 298 evaluation of abdominal aortic aneurysmectomies performed in Veterans Affairs hospitals from 1991 through 1997. Their study was the only one that used clinical data, which allowed them to construct a relatively powerful risk-adjustment model. Yet they found only a very weak, nonsignificant association between procedure or specialty volume and risk-adjusted 30-day mortality.

Several studies of this topic have explored whether the volume of something other than the index procedure may be a more powerful predictor of mortality than the volume of the index procedure. The underlying hypothesis is that experience acquired on related, but not identical, cases may lead to improved outcomes. However, two studies failed to show any significant association between the total physician 299 or hospital 62 volume of abdominal aortic aneurysmectomies and mortality among ruptured aneurysms, whereas one study showed similar associations between total hospital volume and mortality for both ruptured and intact aneurysms. 188 Of two studies that evaluated the impact of total vascular surgery volume, one found a significant effect on mortality for both ruptured and intact aneurysms, 302 whereas the other found no effect for intact aneurysms (adjusting for procedure-specific volume). 298 One study found that the hospital volume of surgery for ruptured aneurysms was not associated with postoperative inpatient mortality, but it was associated with fewer inpatient deaths for ruptured aneurysms, suggesting that high-volume hospitals may manage ruptured aneurysms more aggressively. 303

It is difficult to determine an appropriate hospital volume threshold from the published literature, because numerous studies reported the volume effect only in linear terms294, 299 or used volume categories simply to display unadjusted mortality.196, 295, 300-302 Among the studies that either reported indirectly standardized mortality rates by volume strata, or used volume categories in multi-level regression models, the recommended hospital volume thresholds were 10, 195 20, 188 or 36 197 cases per year of either intact or ruptured aneurysms, or 50 elective repairs of intact aneurysms per 6 years. 296 Among the studies that analyzed volume as a linear effect, but displayed crude mortality by stratum, the stratum definitions were 10 or 21 intact aneurysms per year,300, 301 32 intact aneurysms per 3 years, 196 or 100 cases of any vascular surgery per year. 302

Although volume-outcome associations have been demonstrated for abdominal aortic aneurysmectomy, volume seems likely to be both insensitive and nonspecific as a measure of quality. Nonetheless, it has been estimated that shifting patients in California from low-volume to high-volume hospitals would avert 40 deaths per year, given that 64% of all operations are performed in low-volume hospitals. 190

Fosters true quality improvement

One possible adverse effect of volume-based measures is to encourage low-volume providers (who may also provide poorer quality of care) to increase their volume, simply to reach an artificial threshold. Such responses would probably not improve patient outcomes to the same extent as moving patients from low-volume to high-volume hospitals. For example, Hannan and colleagues 195 found that the 22 surgeons in New York who increased their volume from 1-6 aneurysm repairs in 1982-84 to 14 or more aneurysm repairs in 1985-87 achieved a minimal decrease in standardized mortality, from 6.8% to 6.2%. The subset of these surgeons who increased their volume to at least 22 cases achieved an important but nonsignificant decrease in standardized mortality, from 5.8% to 2.5%. At the extreme, hospitals may loosen eligibility criteria and perform procedures on patients who are marginal or inappropriate candidates. These arguments would not apply to the subset of ruptured aneurysms, which are not subject to volume manipulation. However, shutting down low-volume hospitals and transferring procedures to high-volume hospitals may worsen outcomes for ruptured aneurysms by delaying surgical intervention.

Prior use

Abdominal aortic aneurysmectomy volume has not been widely used as an indicator of quality. In its Web site, the Pacific Business Group on Health 304 states that "one marker of how well a hospital is likely to perform is the experience of the hospital and its surgical team...in the absence of data to compare hospitals on their complications and survival rates, you can begin evaluating experience by looking at the number of (abdominal aortic aneurysm) surgeries a hospital performs each year." The Center for Medical Consumers posts hospital-specific and operator-specific volumes of "resection of aorta with replacement" for New York hospitals. 305

Empirical Evidence
TestStatisticRating
Procedure volume
   Raw mean volume (standard deviation)14 / (16) 
   Median/90th/95th percentile8 / 36 / 46 
   Stability over time, mean in 1995 / mean in 199714 / 14Stable
Percentage of procedures at high volume hospitals   Mod / Low
   Percentage threshold 1 (% hosp at threshold)84% (44%) 
   Percentage threshold 2 (% hosp at threshold)43% (12%) 
Persistence of high volume Moderate
   High volume remaining high, 95/96 (Threshold 1, threshold 2)86% / 81% 
   High volume remaining high, 96/97 (Threshold 1, threshold 2)87% / 76% 
Procedure Volume

In 1996, 727 hospitals (54.1% of providers) performed at least one procedure. Of these hospitals, the mean (standard deviation) of the number of procedures was 14 (16). The median was 8, and the 90th and 95th percentile was 36 and 46, respectively. In general, there are many hospitals with lower volumes and a few hospitals with much higher volumes. Overall procedure volume was stable over the 1995-1997 time period. The mean number of procedures performed in 1995 and 1997 was 14.

Percentage of Procedures at High Volume Hospitals

A moderate to low percentage of procedures were performed at high volume hospitals, depending on which threshold you use. At the threshold 1, 83.9% of AAA repair procedures were performed at 'high volume' providers (and 44.3% of providers are 'high volume'). At the threshold 2, 43.0% were performed at 'high volume' providers (and 12.2% of providers are 'high volume').

Persistence of High Volume

High volume status was moderately persistent over time, depending on the volume threshold used. At threshold 1, 86.2% of high volume providers in 1995 were also high volume in 1996. Similarly, 81.1% of high volume providers in 1996 were also high volume in 1997. At threshold 2, 87.0% of high volume providers in 1995 were also high volume in 1996. Similarly, 75.9% of high volume providers in 1996 were also high volume in 1997.

Construct validity

We estimated the correlation between AAA volume and mortality, adjusting for patient characteristics such as age, sex, and APR-DRG. Volume for carotid endarterectomy is independently and negatively correlated with mortality for carotid endarterectomy (r=-.35, p<.001).

Discussion

AAA repair is a relatively rare procedure. Our empirical analysis found a mean of 14 procedures per year. While a large number of hospitals perform at least one procedure, only 44% (threshold 1) or 12% (threshold 2) of hospitals are actually high volume. The relationship between volume and outcome has been established in the literature, specifically that higher volume hospitals have lower mortality than lower volume hospitals; differences in patient case-mix do not account fully for these relationships. Our empirical analysis noted that AAA repair volume was negatively correlated to AAA repair mortality. However, our results do not include the complex risk adjustment contained in the studies reported in the literature.

This indicator is measured with great precision, as is expected with all volume indicators. It is expected that volume for AAA repair would be measured precisely using discharge abstract data. Most procedures are performed in an inpatient setting.

The volume-outcome relationship on which this indicator is based may not hold over time, as providers become more experienced or as technology changes. It is important then to revisit the volume-outcome relationship to ensure the validity of this indicator. Overall, volume measures are not direct measures of quality, and are relatively insensitive. For this reason they should be used with caution and in conjunction with other measure of mortality, to ensure that increasing volumes truly improve patient outcome.

Our empirical analyses found that most AAA procedures are not actually performed at high volume hospitals already. This leaves ample room for improvement. Relatively few hospitals are high volume. It is unclear whether simply increasing volume at low volume hospitals would actually improve outcomes. It is possible that hospitals could increase volume simply by increasing the number of borderline or inappropriate procedures performed. For this reason, it may be prudent to examine this indicator alongside area rates for this procedure, and examinations of the appropriateness of the procedures.

Performance for this indicator is highly dependent on the volume threshold used.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. Specific caveats should be kept in mind when using this indicator. As a volume indicator, this indicator is a proxy measure for quality, and it is recommended that it be used with other indicators. Theoretically, providers could increase the number of procedures on patients with questionable indications as well, though this is more difficult than for other indicators.

INDICATOR 2: CAROTID ENDARTERECTOMY (CE) VOLUME

IndicatorCarotid endarterectomy raw volume
Relationship to QualityBetter outcomes have been associated with higher volumes. Higher volumes thus represent better quality.
BenchmarkThreshold 1: 50 or more procedures per year 188
Threshold 2: 101 or more procedures per year189, 190

Method:

Quality MeasureProvider level CE raw volume
Outcome of InterestDischarges with ICD-9 Code 38.12 in any procedure field.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskNot applicable
Evidence from the literature
Face validity

Procedure volume is a surrogate measure of quality; its face validity depends on whether a strong association with outcomes of care is both plausible and widely accepted in the professional community.

Carotid endarterectomy (CEA) is a procedure that requires technical proficiency with the use of complex equipment. Technical errors may lead to clinically significant complications, such as abrupt carotid occlusion with or without stroke, myocardial infarction, and death. In two major randomized trials of CEA, researchers pre-selected centers with low rates of perioperative stroke and death, in the belief that these measures reflect surgical skill and quality of care.306, 307 As a result, recent professional guidelines focus on the importance of monitoring surgical outcomes, and the avoidance of promoting volume standards. 308

Precision

Publication of the North American Symptomatic Carotid Endarterectomy Trial (NASCET) 247 and the Asymptomatic Carotid Atherosclerosis Study (ACAS) 248 has led to a dramatic increase in the performance of CEA.249-251 Approximately 144,000 CEAs were performed in the USA in 1997 (5.3 per 10,000 persons). 309 However, many hospitals perform relatively few procedures, suggesting that the actual annual count of procedures may not be a reliable guide to the number of procedures performed on an ongoing basis. For example, 60% of institutions performed fewer than 17 procedures per year in one study of Medicare beneficiaries. 258 Approximately 50% of CEAs performed on Medicare beneficiaries in one state occurred in hospitals performing 21 or fewer operations annually. 64 Although these numbers involve Medicare patients only, Medicare beneficiaries comprise approximately 70% of the patients who undergo CEA, so total patient volumes are unlikely to differ substantially.

The number of CEA procedures is measured accurately with discharge data; in fact, discharge data are probably the best available source for hospital volume information.

Minimum bias

Volume measures are not subject to bias due to disease severity and comorbidities. For this reason, risk-adjustment is not appropriate. Although volume measures are theoretically subject to bias due to variation across hospitals in the use of outpatient surgery facilities, less than 1% of CEA surgeries in 1996 were performed in ambulatory settings." 297

Construct validity

Volume is not a direct measure of the quality or outcomes of care. Although higher volumes have been repeatedly associated with better outcomes after CEA, these findings may be limited by inadequate risk adjustment. Only two studies outside the Veterans Affairs system were based on clinical data sets that included "indication for surgery"; both used hospital-specific Medicare volume, not total volume, as the key independent variable.64, 310

Surgeons and hospitals with higher patient volumes tend to have fewer adverse outcomes, including new strokes and deaths.64, 258, 310-312 Only two major studies, one limited to Veterans Affairs medical centers 298 and the other from Finland, 313 failed to show a significant hospital volume-outcome relationship for CEA. The magnitude of this effect was impressive in the two studies with the best severity adjustment; for example, Cebul et al. found that undergoing surgery in a high-volume hospital was associated with a 71% reduction in the risk of stroke or death at 30 days, after adjustment for age, gender, indication for surgery, renal insufficiency, and two cardiovascular comorbidities.64, 310 In the study by Karp et al., the risk of severe stroke or death was 2.6 times higher at the lowest-volume hospitals than at the highest-volume hospitals.64, 310 Other studies with more limited risk adjustment have reported adjusted odds ratios at low-volume hospitals of 1.28 for death 189 and 2.5-3.1 for "death or definite stroke" (based on ICD-9-CM codes).64, 310

Optimal volume thresholds are difficult to determine, because most studies have only counted Medicare cases. Using a statewide database in New York, Hannan et al. found that hospitals with fewer than 101 cases per year had elevated risk-adjusted mortality. Researchers using Medicare data have reported volume threshold effects at 4064, 310 and 6264, 310 cases per year, or continuous effects over a volume range from <7 or <11 up to >21 or >50 cases per year.64, 310

Fosters true quality improvement

One possible adverse effect of volume-based measures is to encourage low-volume providers (who may also provide poorer quality of care) to increase their volume, perhaps to reach a threshold of 101 cases per year. Such responses would probably not improve patient outcomes to the same extent as moving patients from low-volume to high-volume hospitals. Indeed, hospitals may loosen eligibility criteria and perform procedures in patients who are marginal or inappropriate candidates. This possibility is worrisome, because the benefits of CEA in asymptomatic patients are modest and easily outweighed by high postoperative complication rates.253, 254, 258, 314 Outcomes analyses among Medicare patients undergoing CEA indicate a substantial difference between efficacy and effectiveness.64, 250, 258. Mortality rates among Medicare patients were substantially higher than those reported in the ACAS, even at high-volume centers that participated in the trial, 258 because of more liberal patient selection. For example, the ACAS excluded all patients over 80 years of age, 248 but 15% of the Medicare patients undergoing CEA outside ACAS were in this age range. 258 Patients over 80 years of age experience 2-3 times the perioperative mortality reported for younger patients.250, 258-260

Despite this caveat, previous studies have shown either no relationship between provider volume and patient selection258, 298 or a tendency for high volume providers to operate on sicker patients. 64 To address this issue, one would ideally consider provider volume in conjunction with the distribution of indications for surgery or major comorbidities. Unfortunately, the indication for surgery is not obtainable from the HCUP database.

The alternative of shutting down low-volume hospitals and transferring procedures to high-volume hospitals may overload these providers and impair access to care.

Prior use

CEA volume has not been widely used as an indicator of quality, although it has been advocated as such. In its Web site, the Pacific Business Group on Health 304 states that "one marker of how well a hospital is likely to perform is the experience of the hospital and its surgical team...in the absence of data to compare hospitals on their complications and survival rates, you can begin evaluating experience by looking at the number of (CEA) surgeries a hospital performs each year." The Center for Medical Consumers posts hospital-specific and operator-specific CEA volumes for New York hospitals. 305

Empirical Evidence
TestStatisticRating
Procedure volume
   Raw mean volume (standard deviation)52 (60) 
   Median/90th/95th percentile31.5 / 129 / 169 
   Stability over time, mean in 1995 / mean in 199752 / 54Stable
Percentage of procedures at high volume hospitals   Moderate
   Percentage threshold 1 (% hosp at threshold)78% (37%) 
   Percentage threshold 2 (% hosp at threshold)51% (17%) 
Persistence of high volume High/ Mod
   High volume remaining high, 95/96
   (Threshold 1, threshold 2)
94% / 88% 
   High volume remaining high, 96/97
   (Threshold 1, threshold 2)
90% / 88% 
Procedure Volume

In 1996, 904 hospitals (67.2% of providers) performed at least one procedure. Of these hospitals, the mean (standard deviation) of the number of procedures was 52 (60). The median was 31.5, and the 90th and 95th percentile was 129 and 169, respectively. In general, there are many hospitals with lower volumes and a few hospitals with much higher volumes. Overall procedure volume was stable over the 1995-1997 time period. The mean number of procedures performed in 1995 and 1997 was 52 and 54, respectively.

Percentage of Procedures at High Volume Hospitals

A moderate percentage of procedures were performed at high volume hospitals. At the threshold 1, 77.8% of carotid endarterectomy procedures were performed at 'high volume' providers (and 37% of providers are 'high volume'). At the threshold 2, 51.0% were performed at 'high volume' providers (and 17% of providers are 'high volume').

Persistence of High Volume

High volume status was highly persistent over time. At threshold 1, 93.5% of high volume providers in 1995 were also high volume in 1996. Similarly, 89.7% of high volume providers in 1996 were also high volume in 1997. At threshold 2, 87.5% of high volume providers in 1995 were also high volume in 1996. Similarly, 87.8% of high volume providers in 1996 were also high volume in 1997.

Construct validity

As we did not retain the carotid endarterectomy mortality indicator due to inadequate precision we were not able to test the construct validity of this indicator. However, CE volume is negatively correlated with several other mortality indicators: CABG (r=-.26, p<.0001), AAA repair (r=-.38, p<.0001) and craniotomy (r=-.18, p<.0001).

Discussion

Carotid endarterectomy is a fairly common procedure. Our empirical analysis found a mean of 52 procedures per year. While a large number of hospitals perform at least one procedure, only 37% (threshold 1) or 17% (threshold 2) of hospitals are actually high volume. The relationship between volume and outcome has been established in the literature, specifically that higher volume hospitals have lower mortality and post-operative stroke rates than lower volume hospitals; differences in patient case-mix do not account fully for these relationships. The relationship may be stronger for hospital volume than for operator volume. Nonetheless, providers may want to examine operator volume as well as hospital volume.

This indicator is measured with great precision, as is expected with all volume indicators. It is expected that volume for carotid endarterectomy would be measured precisely using discharge abstract data. Most procedures are performed in an inpatient setting.

The volume-outcome relationship on which this indicator is based may not hold over time, as providers become more experienced or as technology changes. It is important then to revisit the volume-outcome relationship to ensure the validity of this indicator. Overall, volume measures are not direct measures of quality, and are relatively insensitive. For this reason they should be used with caution and in conjunction with other measure of mortality, to ensure that increasing volumes truly improve patient outcomes.

Our empirical analyses found that many CE procedures are actually performed at high volume hospitals already. This leaves some room for improvement, but not as much as for other indicators. However, relatively few hospitals are high volume. It is unclear whether simply increasing volume at low volume hospitals would actually improve outcomes. It is possible that hospitals could increase volume simply by increasing the number of borderline or inappropriate procedures performed. For this reason, it may be prudent to examine this indicator alongside area rates for this procedure, and examinations of the appropriateness of the procedures.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. Specific caveats should be kept in mind when using this indicator. As a volume indicator, this indicator is a proxy measure for quality, and it is recommended that it be used with other indicators. Providers could increase the number of procedures on patients with questionable indications as well.

INDICATOR 3: CORONARY ARTERY BYPASS GRAFT (CABG) VOLUME

IndicatorCoronary artery bypass graft (CABG) raw volume
Relationship to QualityBetter outcomes have been associated with higher volumes. Higher volumes thus represent better quality.
BenchmarkThreshold 1: 100 or more procedures per year 193
Threshold 2: 200 or more procedures per year 54

Method:

Quality MeasureProvider level CABG raw volume
Outcome of InterestDischarges with ICD-9 Codes 36.10 - 36.19 in any procedure field.
Age 40 years and older.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskNot applicable
Evidence from the literature
Face validity

Procedure volume is a surrogate measure of quality; its face validity depends on whether a strong association with outcomes of care is both plausible and widely accepted in the professional community.

CABG is a procedure that requires technical proficiency with the use of complex equipment. Technical errors may lead to clinically significant complications, such as myocardial infarction, stroke, and death. On the basis of this knowledge and empirical literature (summarized below), the American Heart Association (AHA) and the American College of Cardiology (ACC) have argued for "careful outcome tracking" and supported "monitoring institutions or individuals who annually perform <100 cases." Noting that "some institutions and practitioners maintain excellent outcomes despite relatively low volumes," this panel concluded that "credentialing policies based on conclusions drawn from these data must be made with caution." 193 A committee of the Society of Thoracic Surgeons reaffirmed that "until conclusive data become available that link volume to outcome, volume should not be used as a criterion for credentialing of cardiac surgeons...each surgeon should be evaluated on his or her individual results." 315

Precision

The frequency of CABG has been relatively stable over the past 15 years, as percutaneous coronary interventions have become more popular. Approximately 366,000 CABG were performed in the USA in 1997 (13.5 per 10,000 persons). 316 Several states, including New York, New Jersey, Pennsylvania, and California, have started statewide programs to monitor the frequency and clinical outcomes of CABG surgery. These states differ markedly in mean CABG volume: 627 (range 94-1,814) in New York (1996); 636 (range 111-1,119) in New Jersey (1996-97, annualized); 449 (range 121-1,165) in Pennsylvania (1994-95, annualized); and 266 (range 3-1,643) in California (1996).317, 318

The number of CABG procedures is measured accurately with discharge data; in fact, discharge data are probably the best available source for hospital volume information. The large number of procedures performed annually at most hospitals suggests that annual volume is not subject to considerable random variation (except perhaps in California and similar states). Indeed, Hannan et al. reported year to year hospital volume correlations of 0.96-0.97 in New York. 118

Minimum bias

Volume measures are not subject to bias due to disease severity and comorbidities. For this reason, risk-adjustment is not appropriate. Although volume measures are theoretically subject to bias due to variation across hospitals in the use of outpatient surgery facilities, less than 1% of CABG surgeries in 1996 were performed in ambulatory settings." 297

Construct validity

Volume is not a direct measure of the quality or outcomes of care. Although higher volumes have been repeatedly associated with better outcomes after CABG, these findings may be limited by inadequate risk adjustment. Sowden et al. 319 systematically reviewed 15 studies of the volume-outcome relationship for CABG; six used non-overlapping data and reported effect estimates for fixed volume categories. Among these six studies, the apparent benefit of high CABG volume (>200 cases per year) diminished as casemix adjustment improved. Because casemix adjustment was generally more complete in more recent studies, the authors could not exclude the possibility that the benefit of high volume actually decreased between 1972 and 1991.

Using a comprehensive clinical database to adjust for age, gender, unstable angina, ejection fraction, functional class, shock, preoperative intra-aortic balloon pump, recent myocardial infarction, and several comorbidities, Hannan found that the adjusted relative risk of inpatient death at high-volume hospitals (>200 cases per year) in 1989-1992 was 0.84, compared with low-volume hospitals. 118 However, only 3.3% of patients in that study underwent CABG at a low-volume hospital. Another recent study, based on a clinical dataset from the Department of Veterans Affairs, reported a very similar adjusted relative risk of 1.33 at hospitals with less than 101 CABG per year, compared with higher-volume centers. 320

Older studies using hospital discharge data found larger effects of hospital volume. Differences in risk-adjusted mortality rates across volume quartiles (20-100, 101-200, 201-350, >350 cases per year) were larger for non-scheduled operations (7.7%, 5.5%, 5.9%, and 4.6%, respectively) than for scheduled operations (3.0%, 2.7%, 2.9%, and 2.2%, respectively) in one study. 228 Analyses using instrumental variables suggested that much, if not all, of the volume effect may be due to "selective referral" of patients to high-quality centers.230, 239 Of course, the direction of causation (e.g., higher volume leads to better outcomes, or vice versa) may not affect the validity of using hospital volume as a marker of quality.

Studies of surgeon volume are less directly relevant to the HCUP Quality Indicator project, but one recent study (from New York) demonstrated a statistically significant association between surgeon volume and mortality, which appears to be decreasing over time. 233 Specifically, surgeons who performed 50 or fewer CABG in 1989 had a risk-adjusted mortality rate 2.2 times greater than that of surgeons who performed 150 or more CABG. This ratio decreased to 1.89 in 1990, 1.39 in 1991, and 1.36 in 1992. Only in 1989 and 1990 were the mortality differences across surgeon volume categories statistically significant. Two earlier studies of surgeon volume generated counter-intuitive and difficult-to-interpret results.227, 321

Although volume-outcome associations have been demonstrated for CABG, volume seems likely to both insensitive and nonspecific as a measure of quality. For example, Hannan found that some low-volume surgeons achieved outstanding risk-adjusted mortality rates of 2.1% or less in 1992; these surgeons were either transiently low-volume or new to New York State. Nonetheless, it has been estimated that shifting patients in California from low-volume to high-volume hospitals would avert 258 deaths per year. 190

Fosters true quality improvement

One possible adverse effect of volume-based measures is to encourage low-volume providers (who may also provide poorer quality of care) to increase their volume, simply to reach a threshold of 200 cases per year. Such responses would probably not improve patient outcomes to the same extent as moving patients from low-volume to high-volume hospitals. At the extreme, hospitals may loosen eligibility criteria and perform procedures on patients who are marginal or inappropriate candidates. The alternative of shutting down low-volume hospitals and transferring procedures to high-volume hospitals may overload these providers and impair access to care.

Prior use

CABG volume has not been widely used as an indicator of quality, although specific volume thresholds have been suggested as "standards" for the profession. In its Web site, the Pacific Business Group on Health 304 states that "one marker of how well a hospital is likely to perform is the experience of the hospital and its surgical team...in the absence of data to compare hospitals on their complications and survival rates, you can begin evaluating experience by looking at the number of (CABG) surgeries a hospital performs each year."

Empirical Evidence
TestStatisticRating
Procedure volume
   Raw mean volume/standard deviation399 (338) 
   Median/90th/95th percentile293 / 830 / 1095 
   Stability over time, mean in 1995 / mean in 1997375 / 401Increasing
Percentage of procedures at high volume hospitals   High
Percentage threshold 1 (% hosp at threshold)98% (88%) 
Percentage threshold 2 (% hosp at threshold)91% (68%) 
Persistence of high volume   High
   High volume remaining high, 95/96
   (Threshold 1, threshold 2)
99% / 97% 
   High volume remaining high, 96/97
   (Threshold 1, threshold 2)
98% / 97% 
Procedure Volume

In 1996, 307 hospitals (26.4% of providers) performed at least one procedure. Of these hospitals, the mean (standard deviation) of the number of procedures was 399 (338). The median was 293, and the 90th and 95th percentile was 830 and 1095, respectively. In general, there are a moderate number of hospitals with lower volumes and a few hospitals with much higher volumes. Overall procedure volume grew over the 1995-1997 time period. The mean number of procedures performed in 1995 and 1997 was 375 and 401, respectively.

Percentage of Procedures at High Volume Hospitals

A high percentage of procedures were performed at high volume hospitals. At the threshold 1, 98.3% CABG procedures were performed at 'high volume' providers (and 88% of providers are 'high volume'). At the threshold 2, 90.7% were performed at 'high volume' providers (and 68% of providers are 'high volume').

Persistence of High Volume

High volume status was highly persistent over time. At threshold 1, 99.0% of high volume providers in 1995 were also high volume in 1996. Similarly, 98.1% of high volume providers in 1996 were also high volume in 1997. At threshold 2, 96.9% of high volume providers in 1995 were also high volume in 1996. Similarly, 97.5% of high volume providers in 1996 were also high volume in 1997.

Construct validity

We estimated the correlation between CABG volume and mortality, adjusting for patient characteristics such as age, sex, and APR-DRG. Volume for CABG is independently and negatively correlated with mortality for CABG (r=-.29, p<.001).

Discussion

CABG is a technically difficult, yet relatively common procedure. Our empirical analysis found a mean of 399 procedures per year. A substantial number of hospitals perform at least one procedure, and 88% (threshold 1) or 68% (threshold 2) of hospitals are actually high volume. Higher volumes of coronary artery bypass graft (CABG) have been associated with better outcomes, namely fewer deaths. While several studies have demonstrated this relationship, these studies also have some flaws. Differences in case mix and the extremely low proportion of procedures taking place in low volume hospitals may account for some of the differences between high volume and low volume hospital. The AHA/ACC has recommended that since some low volume hospitals have very good outcomes, other outcomes measures besides volume should be used to evaluate individual surgeons performance. However, the AHA/ACC does not make a recommendation based on volume as a quality indicator. Providers may want toexamine operator volume as well as hospital volume. Our empirical analysis noted that CABG volume was very slightly negatively correlated to CABG mortality. However, our results do not include the complex risk adjustment contained in the studies reported in the literature; this could explain the lack of strong association seen in our results.

This indicator is measured with great precision, as is expected with all volume indicators. It is expected that volume for CABG would be measured precisely using discharge abstract data. Most procedures are performed in an inpatient setting.

The volume-outcome relationship on which this indicator is based may not hold over time, as providers become more experienced or as technology changes. It is important then to revisit the volume-outcome relationship to ensure the validity of this indicator. Overall, volume measures are not direct measures of quality, and are relatively insensitive. For this reason they should be used with caution and in conjunction with measures of mortality and of quality of care within that field (in this case cardiac care), to ensure that increasing volumes truly improve patient outcomes.

Our empirical analyses found that most CABG procedures are actually performed at high volume hospitals already. This leaves little room for improvement. Further, most hospitals are high volume. It is unclear whether simply increasing volume at the few remaining low volume hospitals would actually improve outcomes. It is possible that hospitals could increase volume simply by increasing the number of borderline or inappropriate procedures performed. For this reason, it may be prudent to examine this indicator alongside area rates for this procedure, and examinations of the appropriateness of the procedures.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. Specific caveats should be kept in mind when using this indicator. As a volume indicator, this indicator is a proxy measure for quality, and it is recommended that it be used with other indicators. Theoretically, providers could increase the number of procedures on patients with questionable indications as well, though this is more difficult than for other indicators.

INDICATOR 4: ESOPHAGEAL RESECTION VOLUME

IndicatorEsophageal resection raw volume
Relationship to QualityBetter outcomes have been associated with higher volumes. Higher volumes thus represent better quality.
BenchmarkThreshold 1: 6 or more procedures per year 198
Threshold 2: 7 or more procedures per year190, 198

Method:

Quality MeasureProvider level esophageal resection raw volume
Outcome of InterestDischarges with ICD-9 Codes 42.40 - 42.42 in any procedure field and diagnosis code of esophageal cancer in any field.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskNot applicable
Evidence from the literature
Face validity

Procedure volume is a surrogate measure of quality; its face validity depends on whether a strong association with outcomes of care is both plausible and widely accepted in the professional community.

Esophageal cancer surgery requires technical proficiency; errors in surgical technique or management may lead to clinically significant complications, such as sepsis, pneumonia, anastomotic breakdown, and death. However, we are not aware of any consensus guidelines or recommendations regarding minimum procedure volume. The National Cancer Policy Board of the Institute of Medicine and the National Research Council recommends that cancer "patients undergoing procedures that are technically difficult to perform and have been associated with higher mortality in lower-volume settings (including esophagectomy) receive care at facilities with extensive experience (e.g., high-volume facilities)."

Precision

The number of esophagectomies is measured accurately with discharge data; in fact, discharge data are probably the best available source for hospital volume information. Although a few facilities have relatively high volumes, most (e.g., 239 of 273 California hospitals) 198 perform 10 or fewer esophagectomies for cancer during a 5-year period. As a result, this measure is expected to have poor precision.

Minimum bias

Volume measures are not subject to bias due to disease severity and comorbidities. For this reason, risk-adjustment is not appropriate. Although volume measures are theoretically subject to bias due to variation across hospitals in the use of outpatient surgery facilities, less than 1% of resections in 1996 were performed in ambulatory settings." 297

Construct validity

Volume is not a direct measure of the quality or outcomes of care. Although higher volumes have been repeatedly associated with better outcomes after esophageal surgery, these findings may be limited by inadequate risk adjustment.

Only one study used clinical data to estimate the association between hospital volume and mortality following esophageal cancer surgery. Begg et al. 65 analyzed retrospective cohort data from the Surveillance, Epidemiology, and End Results(SEER)-Medicare linked database from 1984 through 1993. The crude 30-day mortality rate was 17.3% at hospitals that performed 1-5 esophagectomies on Medicare patients during the study period, versus 3.9% and 3.4% at hospitals that performed 6-10 and 11 or more esophagectomies, respectively. The association between volume and mortality remained highly significant (p<.001) in a multivariate model, adjusting for the number of comorbidities, cancer stage and volume, and age.

Two other studies using hospital discharge data found similar effects of hospital volume. Using 1990-94 data from California, Patti et al. 198 estimated risk-adjusted mortality rates of 17%, 19%, 10%, 16%, and 6% across five hospital volume categories (e.g., 1-5, 6-10, 11-20, 21-30, and >30 procedures during the 5-year study period). Their risk adjustment was quite limited; only the year of surgery, age, sex, race, payer source, tumor location, and the total number of secondary diagnoses were included. Using 1990-97 data from Maryland (adjusting only for age and payer source), Gordon et al. 322 estimated that the adjusted odds of death at minimal-volume (<11 "complex gastrointestinal procedures" per year) and low-volume (11-20 procedures/year) hospitals were 3.8 and 4.0 times that at a high-volume hospital (214 procedures/year). However, the generalizability of these results is limited by the fact that the last category included only one hospital. An older British study found a surgeon volume effect, but did not consider hospital volume. 323

Although volume-outcome associations have been demonstrated for esophageal cancer surgery, volume seems likely to both insensitive and nonspecific as a measure of quality. It has been estimated that shifting patients in California from low-volume to high-volume hospitals would avert only 7 deaths per year, although 77% of all operations are performed in low-volume hospitals. 190

Fosters true quality improvement

One possible adverse effect of volume-based measures is to encourage low-volume providers (who may also provide poorer quality of care) to increase their volume, simply to reach a threshold of 6 cases per year. Such responses would probably not improve patient outcomes to the same extent as moving patients from low-volume to high-volume hospitals. At the extreme, hospitals may loosen eligibility criteria and perform procedures on patients who are marginal or inappropriate candidates. The alternative of shutting down low-volume hospitals and transferring procedures to high-volume hospitals may overload these providers and impair access to care.

Prior use

Esophageal cancer surgical volume has not been widely used as an indicator of quality.

Empirical Evidence
TestStatisticRating
Procedure volume
   Raw mean volume/standard deviation2 (3) 
   Median/90th/95th percentile1 / 4 / 6 
   Stability over time, mean in 1995 / mean in 19972 / 2Stable
Percentage of procedures at high volume hospitals   Low
   Percentage threshold 1 (% hosp at threshold)40% (9%) 
   Percentage threshold 2 (% hosp at threshold)34% (6%) 
Persistence of high volume   Low
   High volume remaining high, 95/96
   (Threshold 1, threshold 2)
50% / 57% 
   High volume remaining high, 96/97
   (Threshold 1, threshold 2)
87% / 76% 
Procedure Volume

In 1996, 265 hospitals (19.7% of providers) performed at least one procedure. Of these hospitals, the mean (standard deviation) of the number of procedures was 2(3). The median was 1, and the 90th and 95th percentile was 4 and 6, respectively. In general, there are a moderate number of hospitals with lower volumes and a few hospitals with much higher volumes. Overall procedure volume was stable over the 1995-1997 time period. The mean number of procedures performed in 1995 and 1997 was 2.

Percentage of Procedures at High Volume Hospitals

A low percentage of procedures were performed at high volume hospitals. At the threshold 1, 39.5% esophageal resection procedures were performed at 'high volume' providers (and 8.6% of providers are 'high volume'). At the threshold 2, 34.3% were performed at 'high volume' providers (and 6.4% of providers are 'high volume').

Percentage of Procedures at High Volume Hospitals

A low percentage of procedures were performed at high volume hospitals. At the threshold 1, 39.5% esophageal resection procedures were performed at 'high volume' providers (and 8.6% of providers are 'high volume'). At the threshold 2, 34.3% were performed at 'high volume' providers (and 6.4% of providers are 'high volume').

Persistence of High Volume

High volume status was not persistent over time. At threshold 1, 50.0% of high volume providers in 1995 were also high volume in 1996. Similarly, 58.3% of high volume providers in 1996 were also high volume in 1997. At threshold 2, 57.1% of high volume providers in 1995 were also high volume in 1996. Similarly, 60.0% of high volume providers in 1996 were also high volume in 1997.

Construct validity

We estimated the correlation between esophageal resection volume and mortality, adjusting for patient characteristics such as age, sex, and APR-DRG. Volume for esophageal resection is moderately and negatively correlated with mortality for esophageal resection (r=-.29, p<.05), as well as mortality after other cancer resection procedures.

Discussion

Esophageal resection is a complex cancer surgery, requiring great technical skill. However, this procedure is rare, with most hospitals performing less than 10 over a 5 year period. Our empirical analyses found that the mean number of procedures per year to be 2. Relatively few hospitals actually perform this procedure, and over half only perform one per year. Despite the rarity of this procedure, relatively strong relationships between volume and outcome, specifically post-operative mortality have been noted in the literature. Nonetheless, no clear threshold has been identified. Our empirical results found volume to be moderately negatively correlated with resection mortality. However, our results do not include the complex risk adjustment used in the studies reported in the literature.

While most volume indicators are measured with high precision, the relative rarity of this procedure results in a less precise indicator, though still highly adequate for use as a quality indicator. From year to year, volumes may change, as may high volume status, as noted in our empirical analysis. Thus, if possible hospitals should examine more than one year of data, averaging volumes for a more precise estimate. Hospitals may also consider use with the pancreatic resection indicator, another complex cancer surgery.

The volume-outcome relationship on which this indicator is based may not hold over time, as providers become more experienced or as technology changes. It is important then to revisit the volume-outcome relationship to ensure the validity of this indicator. Overall, volume measures are not direct measures of quality, and are relatively insensitive. For this reason they should be used with caution and in conjunction with measures of mortality and of quality of care within that field (in this case complex cancer surgery), to ensure that increasing volumes truly improve patient outcomes.

Our empirical analyses found that relatively few resection procedures are actually performed at high volume hospitals already. However, it is unlikely that shifting the small number of procedures from low to high volume hospitals would actually increase quality of care, as frail patients would need to travel farther. Further, only a handful hospitals are high volume. It is unclear whether simply increasing volume at low volume hospitals would actually improve outcomes.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. Specific caveats should be kept in mind when using this indicator. As a volume indicator, this indicator is a proxy measure for quality, and it is recommended that it be used with other indicators. Theoretically, providers could increase the number of procedures on patients with questionable indications as well, though this is more difficult than for other indicators.

INDICATOR 5: PANCREATIC RESECTION VOLUME

IndicatorPancreatic resection raw volume
Relationship to QualityBetter outcomes have been associated with higher volumes. Higher volumes thus represent better quality.
BenchmarkThreshold 1: 10 or more procedures per year 199
Threshold 2: 11 or more procedures per year 199

Method:

Quality MeasureProvider level pancreatic resection raw volume
Outcome of InterestDischarges with ICD-9 Codes 56.2 or 52.7 in any procedure field and diagnosis code of pancreatic cancer in any field.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskNot applicable
Evidence from the literature
Face validity

Procedure volume is a surrogate measure of quality; its face validity depends on whether a strong association with outcomes of care is both plausible and widely accepted in the professional community.

Pancreatic cancer surgery requires technical proficiency; errors in surgical technique or management may lead to clinically significant complications, such as sepsis, anastomotic breakdown, and death. However, we are not aware of any consensus guidelines or recommendations regarding minimum procedure volume. The National Cancer Policy Board of the Institute of Medicine and the National Research Council recommends that cancer "patients undergoing procedures that are technically difficult to perform and have been associated with higher mortality in lower-volume settings (including pancreatic resection) receive care at facilities with extensive experience (e.g., high-volume facilities)."

Precision

The number of pancreatectomies is measured accurately with discharge data; in fact, discharge data are probably the best available source for hospital volume information. Although a few facilities have relatively high volumes, most (e.g., 263 of 298 California hospitals) 199 perform 10 or fewer esophagectomies for cancer during a 5-year period. As a result, this measure is expected to have poor precision.

Minimum bias

Volume measures are not subject to bias due to disease severity and comorbidities. For this reason, risk-adjustment is not appropriate. Although volume measures are theoretically subject to bias due to variation across hospitals in the use of outpatient surgery facilities, less than 1% of pediatric heart surgeries in 1996 were performed in ambulatory settings." 297

Construct validity

Volume is not a direct measure of the quality or outcomes of care. Although higher volumes have been repeatedly associated with better outcomes after esophageal surgery, these findings may be limited by inadequate risk adjustment.

Only one study used clinical data to estimate the association between hospital volume and mortality following esophageal cancer surgery. 65 Begg et al. analyzed retrospective cohort data from the Surveillance, Epidemiology, and End Results(SEER)-Medicare linked database from 1984 through 1993. The crude 30-day mortality rate was 12.9% at hospitals that performed 1-5 pancreatic resections on Medicare patients during the study period, versus 7.7% and 5.8% at hospitals that performed 6-10 and 11 or more pancreatic resections, respectively. The association between volume and mortality remained significant (p=.01) in a multivariate model, adjusting for the number of comorbidities, cancer stage and volume, and age.

Eight of the ten studies using hospital discharge data found similar effects of hospital volume. Using 1990-94 data from California, Glasgow and Mulvihill 199 estimated risk-adjusted mortality rates of 14%, 10%, 9%, 7%, 8%, and 4% across six hospital volume categories (e.g., 1-5, 6-10, 11-20, 21-30, 31-50, and >50 procedures during the 5-year study period). Their risk adjustment was quite limited; only the year of surgery, age, sex, race, payer source, extent of resection, and the total number of secondary diagnoses were included. Using 1990-97 data on radical pancreaticoduoden?ectomies from Maryland (adjusting only for age and payer source), Gordon et al. ( 322 ) estimated that the adjusted odds of death at minimal-volume (<11 "complex gastrointestinal procedures"/year) and low-volume (11-20 procedures/year) hospitals were 12.5 and 10.4 times that at a high-volume hospital (214 procedures/year). However, the generalizability of these results is limited by the fact that the last category included only one hospital.

Lieberman et al. 324 used 1984-91 hospital discharge data from New York State to analyze the association between mortality after pancreatic cancer resection and both physician and hospital volumes. Adjusting for the year of surgery, age, sex, race, payer source, transfer status, and the total number of secondary diagnoses, the standardized mortality rate was 19%, 12%, 13%, and 6% at minimal (<10 patients during the 8-year study period), low (10-50 patients), medium (51-80 patients), and high-volume (>80 patients) hospitals, respectively. Surgeon volume was less significantly associated with mortality (6-13% risk-adjusted mortality across 3 volume categories); this effect disappeared in a model that included both physician and hospital volume. The dominance of hospital volume over surgeon volume was confirmed by Sosa et al. 325 , using Maryland data.

Studies using administrative data from Ontario, 326 the United Kingdom, 327 and Medicare 328 have generated results similar to those from California and New York. The only studies that failed to show a significant hospital volume-outcome association were based on relatively small, nonrepresentative samples from Department of Defense 329 or major university 330 hospitals.

Although volume-outcome associations have been demonstrated for pancreatic cancer surgery, volume seems likely to both insensitive and nonspecific as a measure of quality. It has been estimated that shifting patients in California from low-volume to high-volume hospitals would avert only 20 deaths per year, although 57% of all operations are performed in low-volume hospitals. 190 However, Gordon et al. 331 estimated that 61% of the observed reduction in statewide deaths among patients undergoing the Whipple procedure was attributable to the increasing market percentage of one facility, from 20.7% to 58.5% between 1984 and 1995.

Fosters true quality improvement

One possible adverse effect of volume-based measures is to encourage low-volume providers (who may also provide poorer quality of care) to increase their volume, simply to reach a threshold of 10 cases per year. Such responses would probably not improve patient outcomes to the same extent as moving patients from low-volume to high-volume hospitals. At the extreme, hospitals may loosen eligibility criteria and perform procedures on patients who are marginal or inappropriate candidates. The alternative of shutting down low-volume hospitals and transferring procedures to high-volume hospitals may overload these providers and impair access to care.

Prior use

Pancreatic cancer surgical volume has not been widely used as an indicator of quality.

Empirical Evidence
TestStatisticRating
Procedure volume
   Raw mean volume/standard deviation3 (8) 
   Median/90th/95th percentile2 / 5 / 10 
   Stability over time, mean in 1995 / mean in 19973 / 3Stable
Percentage of procedures at high volume hospitals   Low
   Percentage threshold 1 (% hosp at threshold)30% (5%) 
   Percentage threshold 2 (% hosp at threshold)27% (4%) 
Persistence of high volume   Low/ Mod
   High volume remaining high, 95/96
   (Threshold 1, threshold 2)
73% / 74% 
   High volume remaining high, 96/97
   (Threshold 1, threshold 2)
71% / 82% 
Procedure Volume

In 1996, 429 hospitals (31.9% of providers) performed at least one procedure. Of these hospitals, the mean (standard deviation) of the number of procedures was 3 (8). The median was 2, and the 90th and 95th percentile was 5 and 10, respectively. In general, there are a moderate number of hospitals with lower volumes and a few hospitals with much higher volumes. Overall procedure volume was stable over the 1995-1997 time period. The mean number of procedures performed in 1995 and 1997 was 3.

Percentage of Procedures at High Volume Hospitals

A low percentage of procedures were performed at high volume hospitals. At the threshold 1, 30.3% of pancreatic resection procedures were performed at 'high volume' providers (and 5.1% of providers are 'high volume'). At the threshold 2, 27.0% were performed at 'high volume' providers (and 4.2% of providers are 'high volume').

Persistence of High Volume

High volume status was not persistent over time. At threshold 1, 72.7% of high volume providers in 1995 were also high volume in 1996. Similarly, 71.4% of high volume providers in 1996 were also high volume in 1997. At threshold 2, 73.7% of high volume providers in 1995 were also high volume in 1996. Similarly, 82.4% of high volume providers in 1996 were also high volume in 1997.

Construct validity

We estimated the correlation between pancreatic resection volume and mortality, adjusting for patient characteristics such as age, sex, and APR-DRG. Volume for esophageal resection is moderately and negatively correlated with mortality for esophageal resection (r=-.41, p<.001), as well as mortality after other cancer resection procedures.

Discussion

Pancreatic resection is a complex cancer surgery, requiring great technical skill. However, this procedure is rare, with most hospitals performing less than 10 over a 5 year period. Our empirical analyses found that the mean number of procedures per year to be 3. Relatively few hospitals actually perform this procedure, and over half only perform only 2 per year. Despite the rarity of this procedure, relatively strong relationships between volume and outcome, specifically post-operative mortality have been noted in the literature. However, no clear threshold has been identified. Our empirical analyses found pancreatic resection volume to be modestly negatively correlated with resection mortality. However, our results do not include the complex risk adjustment contained in the studies reported in the literature.

While most volume indicator are measured with high precision, the relative rarity of this procedure results in a less precise indicator, though still highly adequate for use as a quality indicator. From year to year, volumes may change, as may high volume status, as noted in our empirical analysis. Thus, if possible hospitals should examine more than one year of data, averaging volumes for a more precise estimate. Hospitals may also consider use with the esophageal resection indicator, another complex cancer surgery.

The volume-outcome relationship on which this indicator is based may not hold over time, as providers become more experienced or as technology changes. It is important then to revisit the volume-outcome relationship to ensure the validity of this indicator. Overall, volume measures are not direct measures of quality, and are relatively insensitive. For this reason they should be used with caution and in conjunction with measures of mortality and of quality of care within that field (in this case complex cancer surgery), to ensure that increasing volumes truly improve patient outcomes.

Our empirical analyses found that relatively few resection procedures are actually performed at high volume hospitals already. However, it is unlikely that shifting the small number of procedures from low to high volume hospitals would actually increase quality of care, as frail patients would need to travel farther. Further, only a handful hospitals are high volume. It is unclear whether simply increasing volume at low volume hospitals would actually improve outcomes.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. Specific caveats should be kept in mind when using this indicator. As a volume indicator, this indicator is a proxy measure for quality, and it is recommended that it be used with other indicators. Theoretically, providers could increase the number of procedures on patients with questionable indications as well, though this is more difficult than for other indicators.

INDICATOR 6: PEDIATRIC HEART SURGERY VOLUME

IndicatorPediatric heart surgery raw volume
Relationship to QualityBetter outcomes have been associated with higher volumes. Higher volumes thus represent better quality.
BenchmarkThreshold : 100 or more procedures per year 194

Method:

Quality MeasureProvider level pediatric heart surgery raw volume
Outcome of InterestDischarges with ICD-9 procedure codes for 1) specified heart surgery (see Appendix 6) in any field or 2) procedure code for any heart surgery and diagnosis of hypoplastic left heart syndrome in any field.
Age less than 18.
See appendix for additional exclusions.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskNot applicable
Evidence from the literature
Face validity

Procedure volume is a surrogate measure of quality; its face validity depends on whether a strong association with outcomes of care is both plausible and widely accepted in the professional community.

Pediatric cardiac surgery requires technical proficiency with the use of complex equipment. Technical errors may lead to clinically significant complications, such as arrhythmias, congestive heart failure, and death. However, we are not aware of any consensus guidelines or recommendations regarding minimum procedure volume.

Precision

The number of pediatric cardiac procedures is measured accurately with discharge data; in fact, discharge data are probably the best available source for hospital volume information. Previous studies suggest that pediatric cardiac surgery is already highly concentrated at a relatively small number of facilities (e.g., 16 hospitals in New York, 37 in California and Massachusetts together). Although some of these facilities have very high volumes, a significant number (e.g., 16 hospitals in California and Massachusetts) perform fewer than 10 cases per year. The highly skewed volume distribution may have an adverse effect on the precision of this measure.

Minimum bias

Volume measures are not subject to bias due to disease severity and comorbidities. For this reason, risk-adjustment is not appropriate. Less than 1% of pediatric heart surgery are performed on an outpatient basis. 297

Construct validity

Volume is not a direct measure of the quality or outcomes of care. Although higher volumes have been repeatedly associated with better outcomes after pediatric cardiac surgery, these findings may be limited by inadequate risk adjustment.

Only one study used prospectively collected clinical data to estimate the association between hospital volume and mortality following pediatric cardiac surgery. 194 Hannan et al. ordered all cardiac surgical procedures by their actual mortality rates in the 1992 - 95 Cardiac Surgery Reporting System database. Expert clinicians then grouped the procedures into four clinically sensible subgroups, designed to achieve maximal separation of crude mortality rates (from 1.4% for Category I to 20.1% for Category IV). A multivariate model that included age, complexity category, and four comorbidities (preoperative cyanosis or hypoxia, acidemia, pulmonary hypertension, major extracardiac anomalies) achieved excellent calibration and discrimination (c=0.818). Using this model to estimate risk-adjusted mortality, Hannan et al. found a statistically significant hospital effect (8.26% risk-adjusted mortality at hospitals with fewer than 100 cases per year, versus 5.95% at higher volume hospitals), which was limited to surgeons who performed at least 75 cases per year. Lower volume surgeons experienced relatively high mortality, regardless of total hospital volume. Risk-adjusted mortality differed between low and high-volume hospitals for all 4 complexity categories, although the smallest difference occurred for the highest risk procedures.

Two other studies using hospital discharge data found similar effects of hospital volume. Using aggregated data from California (1988) and Massachusetts (1989), Jenkins et al. 332 estimated risk-adjusted mortality rates of 8.35% and 5.95% at low-volume (100 or fewer cases) and high-volume (more than 100 cases), respectively. However, they also demonstrated especially high risk-adjusted mortality (18.5%) at very low-volume hospitals with fewer than 10 annual cases, and especially low mortality (3.0%) at very high-volume hospitals with more than 300 annual cases. Jenkins et al. could not evaluate the impact of surgeon volume, but they did report stronger volume effects for higher-risk procedures (e.g., OR=12.1 and 3.2 for Category III-IV procedures at hospitals with <10 and 10-100 annual cases, versus OR=2.4 for Category I-II procedures at hospitals with 10-100 annual cases). Finally, Sollano et al. 295 applied the same 4-category risk adjustment procedure developed by Jenkins to hospital discharge data from New York State in 1990-95. They reported a modest but statistically significant effect (OR=0.944 for each additional 100 annual cases), which was limited to neonates (OR=0.636) and post-neonatal infants (OR=0.720) in stratified analyses.

Although volume-outcome associations have been demonstrated for pediatric cardiac surgery, volume seems likely to both insensitive and nonspecific as a measure of quality. In addition, pediatric cardiac care is already regionalized, so most procedures are performed in medium-to-high volume hospitals. It has been estimated that shifting patients in California from low-volume to high-volume hospitals would avert only 7 deaths per year. 190

Fosters true quality improvement

One possible adverse effect of volume-based measures is to encourage low-volume providers (who may also provide poorer quality of care) to increase their volume, simply to reach a threshold of 100 cases per year. Such responses would probably not improve patient outcomes to the same extent as moving patients from low-volume to high-volume hospitals. At the extreme, hospitals may loosen eligibility criteria and perform procedures on patients who are marginal or inappropriate candidates. The alternative of shutting down low-volume hospitals and transferring procedures to high-volume hospitals may overload these providers and impair access to care.

Prior use

Pediatric cardiac surgical volume has not been widely used as an indicator of quality.

Empirical Evidence
TestStatisticRating
Procedure volume
   Raw mean volume/standard deviation53 (90) 
   Median/90th/95th percentile2.5 / 149 / 245 
   Stability over time, mean in 1995 / mean in 199752 / 52Stable
Percentage of procedures at high volume hospitals   Moderate
   Percentage threshold 1 (% hosp at threshold)76% (21%) 
   Percentage threshold 2 (% hosp at threshold)N/A 
Persistence of high volume   Moderate
   High volume remaining high, 95/96 (Threshold 1, threshold 2)85% 
   High volume remaining high, 96/97 (Threshold 1, threshold 2)84% 
Procedure Volume

In 1996, 126 hospitals (9.3% of providers) performed at least one procedure. Of these hospitals, the mean (standard deviation) of the number of procedures was 53 (90). The median was 2.5, and the 90th and 95th percentile was 149 and 245, respectively. In general, there are many hospitals with lower volumes and a few hospitals with much higher volumes. Overall procedure volume was stable over the 1995-1997 time period. The mean number of procedures performed in 1995 and 1997 was 52.

Percentage of Procedures at High Volume Hospitals

A moderate percentage of procedures were performed at high volume hospitals. At the threshold 1, 75.5% of pediatric heart surgeries were performed at 'high volume' providers (and 21% of providers are 'high volume'). There is no threshold 2 for this procedure.

Persistence of High Volume

High volume status was moderately persistent over time. At threshold 1, 84.6% of high volume providers in 1995 were also high volume in 1996. Similarly, 84.0% of high volume providers in 1996 were also high volume in 1997.

Construct validity

We estimated the correlation between pediatric heart surgery volume and mortality, adjusting for patient characteristics such as age, sex, and APR-DRG. Pediatric heart surgery volume is independently and negatively correlated with mortality (r=−.27, p<.05). However, this analysis does not include the intensive risk adjustment included in the volume studies described in the literature review.

Discussion

Pediatric Heart surgery includes a number of procedures, varying in difficulty. In general pediatric heart surgery is technically complex and differs from adult heart surgery. A large number of hospitals perform at least one procedure, but only 21% of hospitals are actually high volume. Further, over half of provider perform only 2.5 procedures a year. Higher volumes of pediatric heart surgery have been associated with better outcomes, namely fewer deaths. Providers may want to examine operator volume as well as hospital volume. Our empirical results noted that pediatric heart surgery volume was slightly positively correlated with mortality. Note that the correlation, significant due to the large number of observations, is small and not considered important. Our results do not include the complex risk adjustment contained in the studies reported in the literature, and required for this comparison.

This indicator is measured with great precision, as is expected with all volume indicators. It is expected that volume for pediatric heart surgery would be measured precisely using discharge abstract data. Most procedures are performed in an inpatient setting.

The volume-outcome relationship on which this indicator is based may not hold over time, as providers become more experienced or as technology changes. It is important then to revisit the volume-outcome relationship to ensure the validity of this indicator. Overall, volume measures are not direct measures of quality, and are relatively insensitive. For this reason they should be used with caution and in conjunction with measures of mortality and of quality of care within that field (in this case pediatric surgery), to ensure that increasing volumes truly improve patient outcome.

Our empirical analyses found that about ¾ of pediatric heart surgeries are actually performed at high volume hospitals already, suggesting regionalization. This leaves little room for improvement. It is unclear whether simply increasing volume at low volume hospitals would actually improve outcomes. It is possible that hospitals could increase volume simply by increasing the number of borderline or inappropriate procedures performed. For this reason, it may be prudent to examine this indicator alongside area rates for this procedure, and examinations of the appropriateness of the procedures.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. Specific caveats should be kept in mind when using this indicator. As a volume indicator, this indicator is a proxy measure for quality, and it is recommended that it be used with other indicators. Theoretically, providers could increase the number of procedures on patients with questionable indications as well, though this is more difficult than for other indicators.

INDICATOR 7: PERCUTANEOUS TRANSLUMINAL CORONARY ANGIOPLASTY (PTCA) VOLUME

IndicatorPercutaneous transluminal coronary angioplasty (PTCA) raw volume
Relationship to QualityBetter outcomes have been associated with higher volumes. Higher volumes thus represent better quality.
BenchmarkThreshold 1: 200 or more procedures per year 191
Threshold 2: 400 or more procedures per year61, 192

Method:

Quality MeasureProvider level PTCA raw volume
Outcome of InterestDischarges with ICD-9 Codes 36.01, 36.02, 36.05 or 36.06 in any procedure field.

Age 40 years and older.

Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskNot applicable
Evidence from the literature
Face validity

Procedure volume is a surrogate measure of quality; its face validity depends on whether a strong association with outcomes of care is both plausible and widely accepted in the professional community.

PTCA is a procedure that requires technical proficiency with the use of complex equipment. Technical errors may lead to clinically significant complications, such as abrupt coronary occlusion with or without myocardial infarction, emergency coronary bypass surgery, and death. On the basis of this knowledge and empirical literature (summarized below), the American Heart Association (AHA) and the American College of Cardiology (ACC) have stated that "a significant number of cases per institution - at least 200 PTCA procedures annually - is essential for the maintenance of quality and safe care." 191 More recent literature (summarized below) led a subsequent expert panel to recommend that "an institution should have an activity level of at least 400 coronary procedures/year...an institution with a volume of <200 procedures/year, unless in a region that is underserved because of geography, should carefully consider whether it should continue to offer the service." 333

The same task force expressed concern that "a majority of operators fail to meet the requirements for maintenance of competence, which is a minimum of 75 PTCA procedures performed per year as the primary operator." This standard has been endorsed by the American College of Physicians; 334 the Society for Cardiac Angiography proposed a lower minimum of 50 cases per year. 334

Precision

PTCA is an increasingly common procedure; approximately 452,000 were performed in the USA in 1997 (16.7 per 10,000 persons). 316 In the Nationwide Inpatient Sample from the Healthcare Cost and Utilization Project, 214 hospitals reported a mean of 382 PTCAs per year in 1993-94. The 27% of hospitals that were classified as low-volume (<200 per year) performed 5% of the procedures, whereas the 31% of hospitals classified as medium-volume (201-400 per year) performed 21% of the procedures and the 42% of hospitals classified as high-volume (>400 per year) performed 74% of the procedures. 172 Based on state all-payer databases, the mean annual frequency of angioplasties was 226 per hospital in California in 1989 335 and 505 per hospital in New York in 1991-1994. 61

The number of PTCA procedures is measured accurately with discharge data; in fact, discharge data are probably the best available source for hospital volume information. The large number of procedures performed annually at most hospitals suggests that annual volume is not subject to considerable random variation.

Minimum bias

Volume measures are not subject to bias due to disease severity and comorbidities. For this reason, risk-adjustment is not appropriate. Although volume measures are theoretically subject to bias due to variation across hospitals in the use of outpatient surgery facilities, only 7.6% of PTCAs in 1996 were performed in ambulatory settings. 297

Construct validity

Volume is not a direct measure of the quality or outcomes of care. Although higher volumes have been repeatedly associated with better outcomes after PTCA, these findings may be limited by inadequate risk adjustment. Using hospital discharge data to adjust for age, gender, multivessel angioplasty, unstable angina, and 6 comorbidities, high-volume hospitals had significantly lower rates of same-stay coronary bypass surgery (CABG) and inpatient mortality than low-volume hospitals. 172 Although the magnitudes of the adjusted differences were not reported, the unadjusted differences were modest (e.g., 3.8% versus 4.6% mortality and 4.3% versus 4.6% CABG rates after myocardial infarction, 0.8% versus 1.0% mortality and 2.8% versus 4.0% CABG rates without myocardial infarction). An earlier study using similar data and volume thresholds reported more adverse outcomes (e.g., CABG or death) than expected at low-volume hospitals (e.g., 12.4% observed versus 10.0% expected after myocardial infarction, 6.3% observed versus 5.2% expected without myocardial infarction) and fewer adverse outcomes than expected at high-volume hospitals (e.g., 8.3% observed versus 10.5% expected after myocardial infarction, 4.4% observed versus 5.0% expected without myocardial infarction). 235 A study based on Medicare data also reported a significant association between hospital volume and mortality, after adjustment for age, sex, race, and year, although no adjusted measures of effect were reported.112, 172, 192, 237, 336, 337 Better studies based on clinical data systems (adjusting for left ventricular function) have confirmed higher risk-adjusted mortality and CABG rates at low and medium (<400 cases per year) volume hospitals, relative to high-volume hospitals (e.g., 1.1%versus 0.80-0.95% mortality, 4.2% versus 2.8-3.7% CABG). 61 A similar study based on clinical data from the Society for Cardiac Angiography and Interventions confirmed the validity of a higher volume threshold than the 200 cases per year originally recommended by the AHA and ACC. Adjusted odds ratios for post-PTCA complications (e.g., death, emergency CABG, or myocardial infarction) were 1.14 at hospitals with 200-399 cases per year, 0.66 at hospitals with 400-599 cases, and 0.54 at hospitals with 600 or more cases. 192

Studies of operator volume are less directly relevant to the HCUP Quality Indicator project, but three studies (from New York, Northern New England, and a community hospital in Los Angeles) have supported associations between operator volume and angiographic and clinical success rates, 336 as well as risk-adjusted same-stay CABG rates.61, 64, 258, 310-312 These studies did not demonstrate any association between operator volume and inpatient mortality. Finally, the most recent study showed that the associations between operator volume and both clinical success rates and CABG rates apparently disappeared between 1990-1993 and 1994-1996 in northern New England. 338

Although volume-outcome associations have been demonstrated for PTCA, volume seems likely to both insensitive and nonspecific as a measure of quality. For example, Hannan demonstrated that low-volume cardiologists can achieve excellent outcomes at high-volume hospitals (e.g., 0.7% risk-adjusted mortality, 2.9% same-stay CABG). 61 Nonetheless, it has been estimated that shifting patients in California from low-volume to high-volume hospitals would avert 80 deaths per year. This number is consistent with a national estimate of 137 averted deaths and 404 averted same-stay CABG. 172 It is possible that a low-volume provider may be unavoidable for urgent procedures in less populated areas. But it is unclear whether such urgent cases would do better with (low-volume) PTCA than with alternative, non-PTCA treatments that are not as volume-sensitive.

Fosters true quality improvement

One possible adverse effect of volume-based measures is to encourage low-volume providers (who may also provide poorer quality of care) to increase their volume, simply to reach a threshold of 200 or 400 cases per year. Such responses would probably not improve patient outcomes to the same extent as moving patients from low-volume to high-volume hospitals. At the extreme, hospitals may loosen eligibility criteria and perform procedures in patients who are marginal or inappropriate candidates. The alternative of shutting down low-volume hospitals and transferring procedures to high-volume hospitals may overload these providers and impair access to care.

Prior use

PTCA volume has not been widely used as an indicator of quality, although specific volume thresholds have been suggested as "standards" for the profession. 333 In its Web site, the Pacific Business Group on Health 304 (http://www.healthscope.org) states that "one marker of how well a hospital is likely to perform is the experience of the hospital and its surgical team...in the absence of data to compare hospitals on their complications and survival rates, you can begin evaluating experience by looking at the number of (PTCA) surgeries a hospital performs each year." The Center for Medical Consumers posts hospital-specific and operator-specific PTCA volumes for New York hospitals (http://www.medicalconsumers.org).

Empirical Evidence
TestStatisticRating
Procedure volume
   Raw mean volume/standard deviation418 (400) 
   Median/90th/95th percentile330 / 869 / 1157 
   Stability over time, mean in 1995 / mean in 1997379 / 447Growing
Percentage of procedures at high volume hospitals   High/ Mod
   Percentage threshold 1 (% hosp at threshold)96% (69%) 
   Percentage threshold 2 (% hosp at threshold)77% (42%) 
Persistence of high volume   High
   High volume remaining high, 95/96 (Threshold 1, threshold 2)98% / 98% 
   High volume remaining high, 96/97 (Threshold 1, threshold 2)98% / 97% 
Procedure Volume

In 1996, 365 hospitals (26.4% of providers) performed at least one procedure. Of these hospitals, the mean (standard deviation) of the number of procedures was 418 (400). The median was 330, and the 90th and 95th percentile was 869 and 1157, respectively. In general, there are a moderate number hospitals with lower volumes and a few hospitals with much higher volumes. Overall procedure volume grew over the 1995-1997 time period. The mean number of procedures performed in 1995 and 1997 was 379 and 447, respectively.

Percentage of Procedures at High Volume Hospitals

A moderate to high percentage of procedures were performed at high volume hospitals, depending on the threshold. At the threshold 1, 95.7% of PTCA procedures were performed at 'high volume' providers (and 69% of providers are 'high volume'). At the threshold 2, 77.0% were performed at 'high volume' providers (and 42% of providers are 'high volume').

Persistence of High Volume

High volume status was highly persistent over time. At threshold 1, 97.8% of high volume providers in 1995 were also high volume in 1996. Similarly, 97.9% of high volume providers in 1996 were also high volume in 1997. At threshold 2, 98.5% of high volume providers in 1995 were also high volume in 1996. Similarly, 96.5% of high volume providers in 1996 were also high volume in 1997.

Construct validity

As we did not retain the PTCA mortality indicator due to inadequate precision, we were unable to test the construct validity of this indicator. However, PTCA volume is negatively related to several other post-procedural mortality rates: CABG (r=−.21, p<.001), craniotomy (r=−.200, p<.0001), and AAA repair (r=−.45, p<.0001).

Discussion

Percutaneous transluminal coronary angioplasty (PTCA) is a relatively common procedure. Our empirical analysis found a mean of 418 procedures per year. A substantial number of hospitals perform at least one procedure, but only 69% (threshold 1) or 42% (threshold 2) of hospitals are actually high volume. Higher volumes of PTCA have been associated with better outcomes, namely fewer deaths and post-procedural coronary artery bypass grafts (CABG). The AHA/ACC have suggested that hospitals perform at least 200 PTCA procedures per year to maintain proficiency. Though many hospitals meet and exceed this rate, there are still hospitals that do not meet this guideline. Operator volume is also important, since many operators do not meet the 75 procedure minimum suggested. Providers may wish to examine operator rates alongside this indicator.

This indicator is measured with great precision, as is expected with all volume indicators. It is expected that volume for PTCA would be measured precisely using discharge abstract data. Though most procedures are performed on an inpatient basis, about 7% of procedures are performed on an outpatient basis. Providers may wish to examine outpatient and inpatient rates together.

The volume-outcome relationship on which this indicator is based may not hold over time, as providers become more experienced or as technology changes. It is important then to revisit the volume-outcome relationship to ensure the validity of this indicator. Overall, volume measures are not direct measures of quality, and are relatively insensitive. For this reason they should be used with caution and in conjunction with measures of mortality and of quality of care within that field (in this case cardiac care), to ensure that increasing volumes truly improve patient outcomes.

Our empirical analyses found that most PTCA procedures are actually performed at high volume hospitals already. This leaves little room for improvement. Further, many hospitals are high volume. It is unclear whether simply increasing volume at low volume hospitals would actually improve outcomes. It is possible that hospitals could increase volume simply by increasing the number of borderline or inappropriate procedures performed. For this reason, it may be prudent to examine this indicator alongside area rates for this procedure, and examinations of the appropriateness of the procedures.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. Specific caveats should be kept in mind when using this indicator. As a volume indicator, this indicator is a proxy measure for quality, and it is recommended that it be used with other indicators. Providers could increase the number of procedures on patients with questionable indications as well, without improving quality of care.

Process Measures

3.E.2 Provider-Level Utilization Measures

INDICATOR 8: CESAREAN SECTION DELIVERY RATE

IndicatorCesarean section delivery rate
Relationship to QualityC-section has been identified as an overused procedure. As such, lower rates of cesarean section represent better quality.
BenchmarkState, regional or peer group average
HP 2010 goal: 15 c-sections per 100 births.

Method:

Quality MeasureProvider-level number of C-sections per 100 deliveries (see Appendix 6).
Outcome of InterestNumber of C-sections.
Population at RiskAll deliveries (see Appendix 6).
Evidence from the literature
Face validity

The rate of cesarean delivery in the United States increased from 5.5% in 1970 to a high of 24.7% in 1988, with a subsequent decrease to 20.7% in 1996. 339 Demographic changes in the childbearing population likely account for a relatively small part of this increase, as suggested by Parrish et al. 340 in a study of primary cesarean delivery rates in Washington State from 1987 to 1990. Previous data in population studies have failed to document commensurate improvements in outcomes associated with this increased utilization, 341 which has raised questions regarding the appropriateness of current practices. Moreover, cesarean delivery is the most common operative procedure performed in the United States 342 and is associated with higher costs than vaginal delivery. The U.S. Department of Health and Human Services 5 and private groups 343 are encouraging reductions in cesarean rate, the former having set a goal of reducing the cesarean delivery rate to 15% by the year 2000.

While appropriateness of the procedure depends largely on patients' clinical characteristics (see Minimum bias below), studies have shown that individual physician practice patterns account for a significant portion of the variation in cesarean delivery rates.344-349 Non-clinical factors such as patient insurance status, hospital characteristics, and geographic region have also been related to rates.350-356

Precision

Burns et al. 347 have shown cesarean delivery is common enough to make good statistical comparisons of hospital and even physician style feasible. Furthermore, the eligible population (pregnant women) is well defined and hospital level reporting reduces the small n problem that may occur with individual providers.

Minimum bias

The overall CS rate cannot determine appropriate use, but the variation in rates across institutions/regions may if the variations do not merely reflect variations in patient disease severity and co-morbidities. Comparison of measures of utilization or outcomes, to be fair, requires adequate adjustment for case mix. 8 Studies that have risk-adjusted cesarean delivery rates differ in the risk factors included (indications for cesarean delivery surrounding many risk factors is controversial) and data sources used.

Keller et al. 357 examined singleton births greater than 2500 grams in hospitals in Washington State in 1989 and 1990 using a combination of administrative and birth certificate data. The authors developed separate multivariate models for each of 4 groups (prior cesarean, breech, first birth, other). Risk adjustment including minimal clinical detail (parity, prior cesarean, breech/malposition, placenta/cord problems, active herpes, mother's age, amnionitis, birth weight, sex of child) could explain most of variance among hospitals. The authors concluded that "adjustment of rates did not greatly alter hospital rankings, but the adjustments are fair, improve face validity, and work surprisingly well in explaining which mothers get cesareans. So they should improve the acceptance of monitoring of rates."

In contrast, Aron et al. 358 used data from standardized reviews of medical records to adjust for clinical risk factors in women without prior cesarean section who delivered in the Cleveland area from 1993 to mid 1995. With respect to unadjusted rates, 7 hospitals were statistical outliers. After risk-adjustment, outlier status changed for 5 (24%) of the 21 hospitals. When hospitals were rank-ordered on the basis of cesarean delivery rates, the correlation between unadjusted and adjusted ranking was only moderate. Hospital rankings often changed and, in 12 of the 21 hospitals (57%), the relative difference in unadjusted and adjusted rates was greater than 10%. The authors note that some of the variation in prevalence of risk factors across hospitals may reflect differences in documentation.

Similarly, Bailit et al. 359 showed that risk-adjusting primary cesarean delivery rates using a state birth certificate database substantially changes how hospital performance is judged. Specifically, they found 27% of hospitals with unadjusted rates in the top quartile had adjusted rates that were "risk-appropriate" and that 23% of hospitals with unadjusted rates that were not in the top quartile had adjusted rates that were. In another study using birth certificate data, Glantz 360 also found that crude rates do not accurately reflect the differences in cesarean delivery rates among hospitals and commented "to make judgements regarding clinical practices on the basis of unadjusted rates entails the risk of unwarranted emulation of some hospitals that only appear to have low rates and unfair criticism of some hospitals with seemingly high rates."

Gregory et al 361 used discharge data to adjust for clinical risk factors for cesarean delivery in 92,798 singleton Medicaid deliveries in Los Angeles County in 1991. The authors categorized patients according to a revised hierarchical set of indications for cesarean section, 362 revised and validated by 363 and adjusted for clinical risk factors based on the presence of maternal ICD-9-CM diagnostic codes. The aim of the LA County study was to describe the difference in risk-adjusted cesarean delivery rates according to hospital type; the effect of risk-adjustment on individual hospital rates or on hospital rankings was not evaluated.

Construct validity

We found no studies explicitly evaluating the construct validity of this indicator. In other words, there is no evidence that hospitals with lower cesarean rates more frequently provide better quality of care according to other measures.

As the cesarean rate for "optimal" quality care is unknown, many studies are careful to note that lower cesarean rates do not necessarily reflect better quality care. In some instances, a higher cesarean rate could reflect more appropriate use of the procedure. For example, a meta-analysis by Gifford et al. 364 suggests that elective cesarean delivery for breech presentation may be associated with better neonatal outcomes. Cesarean delivery rates substantially less than the national trend of 90% for infants in a breech presentation may indicate underutilization; however, correlation with maternal and neonatal outcomes would help clarify this issue.

Fosters true quality improvement

The cesarean delivery rate can be decreased by decreasing the primary cesarean delivery rate and/or increasing the vaginal birth after cesarean (VBAC) rate. In some hospitals, one or both of these might result in more maternal and/or infant complications. Sachs et al. 365 note that when a trial of labor after cesarean delivery fails, the rate of maternal morbidity, including infection and operative injuries, increases substantially. The authors cite increasing incidence of the major risk of a trial of labor, uterine rupture, in several states in recent years. However, they caution that, in the absence of chart reviews, one cannot be sure that all these cases involved rupture of a uterine scar from a previous cesarean delivery. Sachs and colleagues also express concern that attempts to decrease the primary cesarean delivery rate may lead to complications associated with higher rates of instrumented (forceps or vacuum-assisted) vaginal delivery. Studies that have compared rates of instrumented vaginal delivery by physicians with low cesarean rates to physicians with higher cesarean rates have found either no significant difference,346, 366 or that physicians with low cesarean rates actually use instrumented delivery less. 367 Depending on how rates are risk adjusted, there is the possibility that providers will respond by upcoding diagnoses once measurement begins, though we found no studies where this has been documented. Errors leading to bias are also possible at the data abstraction level.

Prior use

Cesarean section rates were one of the first measures used to judge hospital and health plan performance, 368 and has become one of the most commonly used indicators. Public Citizen's Health Research Group report cesarean delivery rates for 3159 hospitals in 41 states. Cesarean delivery rate is included among the 16 core performance measures in the Maryland Hospital Association's Quality Indicator (QI) Project. 369 Repeat and all cesarean section rates are used by the University Hospital Consortium. 370 Total cesarean delivery rates are used by Florida Agency for Health Care Administration, 371 Greater New York Hospital Association, 372 Michigan Hospital Association, 373 Pacific Business Group on Health, 304 United Health Care, Cleveland Health Quality Choice, 374 JCAHO's IMSystem, Virginia Health Information, 375 Washington State Community Health Information Partnership, 376 HealthGrades.com. 377 In addition, Cesarean section was included in the previous version of HCUP I QIs, and the reduction of cesarean section rate is a goal for Healthy People 2010. 5

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation21.4%, 8.7% 
   Systematic provider-level standard deviation*4.5%High
   Provider variation as a percentage of total variation*1.2%High
   Signal ratio88.2%High
   R-Square*92.5%Very High
   * age adjusted
Minimum Bias - age risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)5.4%Very Good
   Relative impact:
      Rank correlation0.956Very Good
      Percent remaining in high decile/low decile72.4%/79.3%Good/Fair
      Percent changing more than 2 deciles2.6%Very Good
Precision

This indicator is precise, with a raw provider level mean of 21.4% and a substantial standard deviation of 8.7%. The systematic provider level standard deviation is high, at 4.5%. The provider level variation accounts for a high of total variation, at 1.2%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level, although more of the variation occurs at the discharge level than for some indicators. Finally, the signal ratio is high, at 88.2%. This means that it is likely that the observed differences in provider performance represent true differences in provider performance, although some of the observed differences are due to unobserved differences in patient characteristics. The R-Square for this indicator is very high at 92.5%, meaning that some additional signal can be extracted using multivariate techniques.

Bias

We did not perform APR-DRG risk adjustment because the categories correspond to the outcome of interest. As a result, the indicator performs well on the multiple measures of minimum bias, using age only (of the mother) risk adjustment. The rank correlation is good at 0.956. Age risk adjustment does seems to impact the lowest and highest decile, with 72.4% of providers remaining after risk adjustment in the highest decile and 79.3% remaining in the lowest decile. There does not seem to be disproportionate impact at either extreme, though the performance in the lowest decile was poorer than that of other indicators. The absolute impact was minimal.

Construct validity

C-section rate loads very highly on factor three, and is inversely related to vaginal delivery after cesarean section and positively related, but to a lesser extent, to incidental appendectomy.

Discussion

Cesarean section has been targeted as a potentially overused procedure, as the rate of c-sections has increased over the past few decades. Despite a recent decrease, many organizations have aimed to monitor and reduce the c-section rate.

Our empirical analyses demonstrated that c-section rate is measured with good precision, as would be expected from the relatively high rates of this procedure. Given the high signal ratio, it is likely that the observed differences in the provider performance represent true differences in provider performance, rather than random variation. This cannot fully account for systematic bias in the indicator, however.

While cesarean section performed well on our tests of minimum bias, with risk adjustment having only a modest impact, the only adjustment performed was of age of the mother (since there is no variation in severity within the cesarean section APR-DRG). We were unable to link maternal and infant records, which may add beneficial risk adjustment factors. Other clinical measures not available in our data set may vary systematically by hospital and introduce some bias. Our literature review located several studies of risk adjustment for this indicator, finding that risk adjustment did affect the outlier status and rankings of as many as 25% of the hospitals. Risk adjustment was also advocated as a means to make the indicator more palatable to providers. Given these results, providers may want to examine the clinical characteristics of their populations when interpreting the results of this indicator. Clinical characteristics such as prior cesarean, parity, breech presentation,placental or cord complications, STDs, infections, birth weight have been shown to explain substantial amounts of variations in cesarean section rates. Information regarding some of these factors may be available by linking maternal discharge records to birth records.

We located no additional studies that examined the construct validity of c-section. However, we found that c-section and VBAC are strongly negatively correlated, as one would expect. Increasing the VBAC rate was proposed as one means of reducing c-section rate. However, a simple correlation does not imply causation, nor guarantee that this relationship is evidence of an underlying quality relationship.

While Healthy People 2010 has established a goal of 15 cesarean sections per 100 births and 63 repeat cesarean sections per 100 births with previous c-section, the ideal rate of cesarean section has not been established. Providers should compare rates to other standards such as regional or national averages. High rates may be explainable by more complex case mix; as there are some cases where a trial of labor would be contraindicated. Thus providers with high c-section rates should examine the patient characteristics available in more clinically detailed records such as birth records, to establish the appropriateness of the procedures. Providers may also wish to further break down the cesarean section indicator to primary and repeat cesarean section rates.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 17 out of 26. This indicator is recommended with several caveats of use. Potential additional bias may result from clinical differences not identifiable in administrative data, so supplemental risk adjustment with linked birth records or other clinical data may be desirable. As a utilization indicator, the construct validity relies on the actual inappropriate use of procedures in hospitals with high rates, and this should be investigated further. Finally, this caution should be maintained for cesarean rates that are drastically below or above the average or recommended rates.

INDICATOR 9: INCIDENTAL APPENDECTOMY AMONG THE ELDERLY RATE

IndicatorIncidental appendectomy among the elderly rate
Relationship to QualityIncidental appendectomy among the elderly is contraindicated. As such lower rates represent better quality care.
BenchmarkState, regional, peer groups average.

Method:

Quality MeasureNumber of incidental appendectomies per 100 elderly with intra-abdominal procedure.
Outcome of InterestNumber of incidental appendectomies (see Appendix 6).
Population at RiskAll non-maternal/non-neonatal discharges age 65 years or older with intra-abdominal procedure in any procedure field (see Appendix 6).

Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates
Evidence from the literature
Face validity

The removal of the appendix incidental to other abdominal surgery, such as urological, gynelogical, or gastrointestinal surgeries, is intended to eliminate the risk of future appendicitis, and to simplify any future differential diagnoses of abdominal pain. Controversy about the procedure has abounded since the early 20th century. 378 Evidence remains unclear for the population as a whole, whether the removal of the appendix increases risk of morbidity and mortality significantly, or whether, given the low risk for future appendicitis, and ease of treatment, it is worth any amount of extra risk. Traditionally, it has been noted that the removal of the appendix may potentially contaminate the other- wise clean operating field.

Unfortunately, the only three published randomized controlled trials did not have sufficient power to exclude a clinically meaningful adverse effect of incidental appendectomy.379-381 In a prospective, randomized placebo-controlled trial of prophylactic antibiotics, patients who underwent cholecystectomy with incidental appendectomy had a substantially higher wound infection rate than patients who underwent cholecystectomy alone if they did not receive antibiotics (40% versus 16%), but not if they did (9% versus 10%). 382 Two retrospective studies based on large administrative data sets demonstrated significant risk associated with incidental appendectomy among cholecystectomy patients, after adjusting for age, sex, primary diagnosis, and comorbidities. The risk of wound infection was 83% higher among elderly Medicare beneficiaries, 383 and the risk of any postoperative complication was 53% higher in Ontario general hospitals, 384 when incidental appendectomy was performed. These studies demonstrated substantial selection bias, in that patients selected for incidental appendectomy were younger and had less comorbidity than other cholecystectomy patients. The difference in baseline characteristics was so profound that incidental appendectomy was associated with a 63% decrease in the unadjusted risk of death in Ontario; this effect disappeared after risk-adjustment and reversed after stratification. These results raise serious questions about the validity of other nonrandomized studies of incidental appendectomy, which used smaller samples and inadequate risk adjustment.

Many of the studies discussed above group all patients together regardless of age. However, Andrew and Roty 385 showed that incidental appendectomy was associated with a higher risk of wound infection (5.9% versus 0.9%) among cholecystectomy patients who were at least 50 years of age, but not among younger patients. Based on this finding, and the findings of Warren and colleagues, most commentators believe that incidental appendectomy is inappropriate for elderly patients.386-388 Although elderly individuals have a higher risk of serious complications due to a perforated appendix, the probability of developing appendicitis is very low, with a lifetime risk of less than 1%. In this age group, it would require at least 115 incidental appendectomies to prevent one hospitalization for appendicitis, and 4,472 incidental appendectomies to avoid a single death.383, 389 Given this logic, the risk of incidental appendectomy is believed to outweigh the benefits in the elderly population.

Precision

Rates of incidental appendectomy in the elderly have not been widely studied recently. One 1993 study of Medicare beneficiaries found that about 4% of cholecystectomy cases had a secondary procedure code of incidental appendectomy. 383 Another study of leading urological surgery departments (non-random sample) found that over 50% of departments did not routinely perform incidental appendectomy during radical cystectomy, regardless of the age of the patient. Fewer than one-third of departments perform it routinely. 390 These findings suggest that incidental appendectomy rates may be difficult to estimate with precision at the majority of hospitals where it is not a routine practice.

Minimum bias

Since incidental appendectomy appears to be contraindicated in an elderly population, very few (if any) cases would be justified by patients' preoperative characteristics. There are documented cases of discovery of diseased appendices during other abdominal surgery, which would justify incidental appendectomy. However, it is unlikely that the number of diseased appendices found incidental to other surgeries would vary systematically across hospitals. There are no identified risk factors that put an individual at higher risk of having an asymptomatic diseased appendix; therefore, it is impossible to estimate the magnitude of bias.

Construct validity

We located no articles explicitly addressing the construct validity of this indicator. Though most of the available evidence appears to contraindicate incidental appendectomy in the elderly, performance of the procedure is subject to patient and surgeon preference. Therefore, incidental appendectomy rates may correlate poorly with other measures of hospital performance. Recent surveys 390 and reviews 387 suggest that incidental appendectomy is still a common practice at some academic centers that enjoy a "high quality" reputation.

Fosters true quality improvement

We found no evidence regarding gaming for this indicator. However, since incidental appendectomy does not generally affect hospital payment, widespread use of this indicator may lead to less frequent coding of the procedure, when it is performed. Since removal of an inflamed appendix is clearly appropriate, it seems unlikely that patients would be denied a necessary appendectomy. Of course, a reduction in the rate of incidental appendectomy may lead to a subsequent increase in the incidence of acute appendicitis.

Prior use

Incidental appendectomy in the elderly is a provider-level utilization indicator in the current HCUP I indicator set. Otherwise, it has not been widely used as an indicator of quality.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation2.7%, 3.5% 
   Systematic provider-level standard deviation**1.9%Moderate
   Provider variation as a percentage of total variation**1.4%High
   Signal ratio55.4%Moderate
   R-Square**67.3%Moderate
  **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentNo ChangeGood
   Absolute impact:
     Average absolute change (in %)5.0%Very Good
   Relative impact:
     Rank correlation0.988Very Good
     Percent remaining in high decile/low decile82.9%/94.6%Good
     Percent changing more than 2 deciles0.3%Very Good
Precision

This indicator is precise, with a raw provider level mean of 2.7% and a standard deviation of 3.5%. The systematic provider level standard deviation is moderate, at 1.9%. The provider level variation accounts for a high percentage of total variation, at 1.4%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level, although more of the variation occurs at the discharge level than for some indicators. Finally, the signal ratio is moderate, at 55.4%. This means that it is likely that some of the observed differences do not represent true differences in provider performance. The moderate R-square for this indicator reflects that relatively modest amount of signal that can be extracted using multi-variate techniques, though these techniques do help somewhat relative to univariate techniques.

Bias

The indicator performs well to very well on the multiple measures of minimum bias. The rank correlation is high at 0.988. Risk adjustment does not appear to impact the extremes of the distribution substantially. Ninety-four percent of providers in the low decile without risk adjustment remain after risk adjustment; 82.9% remain in the highest decile. The absolute magnitude of the impact of risk adjustment is modest.

Construct validity

Incidental appendectomy loads highly on factor 3. It is positively related to cesarean section delivery and negatively related to VBAC.

Discussion

Incidental appendectomy is contraindicated in the elderly population, as this population has both a lower risk for developing appendicitis and a higher risk of postoperative complications when incidental appendectomy is performed. The procedure is not currently performed widely in the elderly, though it is still performed. The contraindications against this procedure in the elderly, as noted in the literature, are compelling and thus this indicator is recommended.

Given the low rate of incidental appendectomies, the precision for this indicator may be lower than other indicators. Our empirical analyses found that this indicator is moderately precisely measured. The moderate signal ratio suggests that some of the observed differences do not reflect true systematic differences in performance.

Empirically we found little evidence of bias in this indicator. The relative effect is somewhat minimal, and thus the bias with respect to provider differences is not likely to be high. As it is unlikely that the only indication for this procedure, the incidental discovery of an asymptomatic diseased appendix, will vary systematically between providers, it is unlikely that this indicator will be substantially biased.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 13 out of 26, and smoothing is recommended. This indicator is recommended with one caveat of use. As a utilization indicator, the construct validity relies on the actual inappropriate use of procedures in hospitals with high rates, and this should be investigated further.

INDICATOR 10: BI-LATERAL CARDIAC CATHETERIZATION RATE

IndicatorBi-lateral cardiac catheterization rate
Relationship to QualityBi-lateral catheterization is contraindicated in most patients without proper indications. As such, lower rates of bi-lateral catheterization represent better quality care.
BenchmarkState, regional or peer group average.

Method:

Quality MeasureProvider level bi-lateral cardiac catheterizations per 100 discharges with procedure code of heart catheterization.
Outcome of InterestAll simultaneous right and left heart catheterizations (see Appendix 6).

Exclude valid indications for right sided catheterization (see Appendix 6) in any diagnosis field.
Population at RiskAll heart catheterizations in any procedure field (see Appendix 6).

Include only coronary artery disease (see Appendix 6).
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates)
Evidence from the literature
Face validity

The diagnostic evaluation of patients with presumptive coronary artery disease often involves cardiac catheterization with coronary angiography. Left-sided catheterization provides very useful information about coronary anatomy, as well as left ventricular function and valvular anatomy. Right-sided catheterization is often performed at the same time, but this practice raises two appropriateness issues. First, without a specific indication for right heart catheterization, the clinical yield is extremely low. In the most rigorous prospective study of this phenomenon, case management was changed for only 1.5% of patients who received an incidental right heart catheterization without a listed indication. 391 Similar results have been reported from two retrospective studies,392, 393 while other studies have failed to distinguish unsuspected right-sided abnormalities that affected management from those that did not. 394 Second, the marginal cost of right heart catheterization has been estimated to exceed $650 per case and $120 million for the nation.

In response to these research findings, the American College of Cardiology and the American Heart Association published guidelines for cardiac catheterization laboratories stating that "without specific indications, routine right heart catheterizations...are unnecessary." 395 Similar guidelines have been published by other medical and public health organizations, such as the Cardiac Advisory Committee of the New York State Department of Health and the Texas Medical Association's Committee on Cardiovascular Diseases.

Precision

In 1996, about 23% of all Medicare beneficiaries who underwent left heart catheterization also underwent right heart catheterization. At the state level, this percentage varied from 11% in Oklahoma to 48% in Massachusetts and 53% in Washington, DC. 396 Given that more than 1.2 million inpatient cardiac catheterizations were performed in the US in 1998, this measure should be estimable with reasonable precision. 397

Minimum bias

Bilateral cardiac catheterization is considered appropriate in the presence of certain clinical indications: suspected pulmonary hypertension or significant right sided valvular abnormalities, congestive heart failure, cardiomyopathies, congenital heart disease, pericardial disease, and cardiac transplantation. The validity of this measure rests on the assumption that the prevalence of these clinical indications is low and/or relatively uniform across the country. Unfortunately, the true prevalence of these indications cannot be reliably derived from administrative data. However, Malone et al 398 found that substantial variation in the use of bilateral catheterization persisted among 37 cardiologists at two large community hospitals, even after adjusting for clinical indications. Bias is likely to account for an even smaller share of variation at the hospital level.

Another source of potential bias is the large number of catheterizations performed on an outpatient basis. In 1996, 472,000 of 1,633,000 catheterizations were performed on an outpatient basis. 297

Construct validity

We located no articles explicitly addressing the construct validity of this indicator.

Fosters true quality improvement

We found no evidence regarding gaming for this indicator. When bilateral cardiac catheterization does not affect hospital payment (as in the DRG system), widespread use of this indicator may lead to less frequent coding of the procedure, when it is performed. It seems unlikely that patients would be denied a bilateral catheterization when the clinical situation clearly warrants it. However, a reduction in the rate of routine bilateral catheterization may lead to rare, but potentially serious, missed diagnoses (e.g., pulmonary hypertension). The long-term significance of missing these rare diagnoses is unclear.

Prior use

Bilateral cardiac catheterization has been widely used as an indicator of quality in the Medicare program. It is one of five quality indicators included in the Medicare Quality of Care Report of Surveillance Measures 399 . From 1993 to 1999, Peer Review Organizations in 20 states developed programs to reduce excessive rates of bilateral cardiac catheterization through education and outreach. Ten of these projects have released results; all documented dramatic utilization changes at the targeted hospitals. It has been estimated that these programs averted at least 6,126 unnecessary bilateral catheterizations. 400 Four of these state-based quality improvement projects have been described in the peer-reviewed literature,401-404 and one documented a spillover effect in the ambulatory setting. 405 The results of these studies suggest that right heart catheterization rates represent an actionable opportunity for quality improvement.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation19.3%, 20.0% 
   Systematic provider-level standard deviation**16.1%Very High
   Provider variation as a percentage of total variation**14.4%Very High
   Signal ratio**94.3%Very High
   R-Square**96.2%Very High
  **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)6.1%Very Good
   Relative impact:
     Rank correlation.988Very Good
     Percent remaining in high decile/low decile70.7%/96.6%Good/V.G.
     Percent changing more than 2 deciles0.2%Very Good
Precision

This indicator is very precise, with a raw provider level mean of 19.3% and a substantial standard deviation of 20.0%. The systematic provider level standard deviation is very high, at 16.1%. The provider level variation also accounts for a very high percentage of total variation, at 14.4%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level. Finally, the signal ratio is very high, at 96.2%. This means that it is likely that the observed differences in provider performance represent true differences in provider performance. The high R-Square demonstrates that a very large proportion of the signal can be extracted using multivariate techniques. However, since the signal ratio is very high to begin with, MSX smoothing adds less additional impact relative to other indicators.

Bias

Signal variance does not change with APR-DRG risk adjustment. The indicator performs well on the multiple measures of minimum bias. The rank correlation is very high at 0.988, and risk adjustment does seem to disproportionately impact the extreme high end relative to the extreme low decile, though the impact is still relatively modest. The absolute impact is also minimal.

Construct validity

Bilateral catheterization rate loads very highly on factor two. It is positively related to CABG mortality and negatively related to laparoscopic cholecystectomy.

Discussion

Right side coronary catheterization incidental to left side catheterization has little additional benefit for patients without indications of right side catheterization. Despite guidelines that have been set forth discouraging such practice, the practice continues in some hospitals.

This indicator received one of the highest precision ratings. Provider level variation accounts for a relatively large portion of the total variation compared to other indicators, meaning that variation for this indicator is influenced less by discharge level variation (patient level) than total variation for other indicators. Given the very high signal ratio, it is likely that the observed differences in the provider performance represent true differences in provider performance, rather than random variation. Multivariate smoothing techniques do give some additional benefits, though these benefits are modest, so MSX techniques may not be required for this indicator. Univariate smoothing is always recommended, however.

In our analyses of minimum bias, we identified very little bias in this indicator, when adjusting for APR-DRGs. One study of cardiologists found that clinical characteristics did not account for the observed variance between providers. While there are appropriate uses for this indicator, it is unlikely that such indications would vary systematically between providers to the degree necessary to explain the observed variance. Thus, we would expect this indicator to be only minimally biased.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 25 out of 26. This indicator is recommended with two caveats of use. First, outpatient procedures may result in selection bias for this indicator, and should be examined. Second, as a utilization indicator, the construct validity relies on the actual inappropriate use of procedures in hospitals with high rates, and this should be investigated further.

INDICATOR 11: SUCCESSFUL VAGINAL BIRTH AFTER CESAREAN SECTION (VBAC) RATE

IndicatorVBAC rate
Relationship to QualityVBAC has been identified as a potentially underused procedure. As such higher appropriate VBAC rates represent better quality care.
BenchmarkState, regional or peer group average

Method:

Quality MeasureProvider-level vaginal births per 100 discharges with diagnosis of previous c-section.
Outcome of InterestNumber of vaginal births in women with diagnosis of previous C-section (see Appendix 6).
Population at RiskAll deliveries with previous C-section diagnosis in any diagnosis field (see Appendix 6).
Evidence from the literature
Face validity

The rate of cesarean section (CS) in the United States increased from 5.5% in 1970 to a high of 24.7% in 1988; with a subsequent decrease to decreased to 20.7% in 1996. 339 In the 1980s, the Department of Health and Human Services concluded that encouraging vaginal birth after cesarean section (VBAC) represented a safe way to decrease the overall CS rate 406 and subsequently set the following targets for the Healthy People 2000 Objectives: a CS rate < 15 per 100 births, with a primary CS rate of 12 or fewer per 100 births and an increase in the vaginal birth after CS (VBAC) rate to 35 or more per 100 births. This target, as well a number of studies in the literature suggesting that increasing VBAC rates could be safely achieved,407-412 led to the common adoption VBAC rates as a QI.

Despite the widespread use of VBAC rates as a QI (including in HCUP I), a randomized trial comparing a trial of labor vs. elective repeat cesarean section has yet to appear. Moreover, while physicians and policy makers have presumed that encouraging increased VBAC rates conforms to patient preferences, approximately one third of patients prefer to pursue elective repeat cesarean section.413-416 In fact, many physicians appear to consider cesarean delivery preferable to vaginal delivery, given the potential complications of the former. 417 Lastly, a recent article indicates that, accounting for costs and not charges, VBAC is unlikely to achieve significant cost savings compared to repeat CS. 418

Recommendations for increasing the VBAC rate began to appear in the U.S. in the 1980s. A number of observational studies in the 1980s and early 1990s suggested that a trial of labor in patients with previous cesarean delivery (CD) represented a safe practice.407-412 More recently, however, evidence for increased risk of maternal and fetal complications associated with this practice has appeared.365, 419-424 These complications include uterine rupture, maternal infection and hypoxic injury to the fetus. 365 Uterine rupture represents the most serious of these complications, and a recent study by the CDC indicates that administrative data do not adequately capture this complication. 425

Interestingly, the authors of some studies suggesting an increased risk of uterine rupture associated with a trial of labor have nonetheless concluded that VBAC is overall a safe procedure.411, 412, 426, 427 Thus, the policy of recommending VBAC represents to some degree a matter of opinion on the relative risks and benefits of a trial of labor in patients with previous CS. Given the substantial more recent evidence regarding maternal and fetal complications as a result of promoting VBAC and the potential for differences of opinion regarding the significance of these adverse outcomes, the existing HCUP I QI may not be regarded as a straightforward corrective for the previous (presumed) overuse of CS. Given the concern regarding maternal and fetal safety, some have advocated for further research to establish the "right rate" of VBAC. 428

Precision

We located no evidence on the precision of this indicator. However, VBAC is a relatively common procedure, and thus we would expect it to be measured with good precision.

Minimum bias

In one study, 429 only 42.0% of women with a CS-Vaginal delivery sequence were correctly identified on the second birth certificate as a VBAC. Only 75% of 25,491 women from 1980 through 1988 with a previous cesarean were so designated on the birth certificate; 80% of women with a V- CS sequence were correctly designated as primary cesarean. Although this study employed birth certificates, these findings suggest that administrative data accurately distinguish the mode of current mode of delivery (vaginal vs. CS), but less accurately identify VBAC and primary cesarean delivery.

The proportion of patients with certain sociodemographic profiles 430 and medical indications for CS or contraindications to a trial of labor358, 360, 431-434 exerts a significant impact on QIs related to rates of CS and VBAC. Although adjusting for case-mix does not eliminate wide variation in rates of CS 435 , unadjusted rates only modestly correlate with adjusted rates 358 . Administrative data sources, such as vital records and hospital discharge data, do not include the clinical factors required to identify appropriate candidates for trial of labor.358, 432, 436, 437 Thus, the denominator for VBAC rates calculated using administrative data will include women with an accepted medical indication for repeat CS delivery (e.g., women with a prior classic CS), not to mention patients who make an informed decision not to pursue a trial of labor. 416

Prompted by concern over an increase uterine rupture (UR) the Massachusetts Department of Public Health and the CDC conducted an investigation of the validity and reliability of ICD-9-CM codes in hospital discharge data to identify UR cases. The study covered maternal discharges from Massachusetts hospitals from 1990 through 1997; women with and without a history of prior CS were included. Potential cases of UR were identified with an ICD-9-CM diagnostic code in any of the 10 diagnostic fields of 665.0 ("rupture of uterus before onset of labor"), 665.1 ("rupture of uterus during labor," including "rupture of uterus not otherwise specified"), or 674.1 ("disruption of cesarean wound," including "dehiscence or disruption of uterine wound"). Two clinicians then reviewed the medical records of suspected cases to confirm the diagnosis of UR (defined as any unintentional disruption of the uterine wall in a pregnant woman regardless of cause, size, degree of severity, or location).

The design of the study does not permit estimation of the sensitivity of the codes for UR. Positive predictive values (PPVs) were calculated as the number of confirmed cases divided by the number of reviewed suspected cases multiplied by 100. The average PPV during the 8-year period was 50.7% for ICD-9-CM codes 665.0 and 665.1 and 28.6% for code 674.1. The overall PPV of the three codes was 39.8%. Approximately half of the uterine ruptures that result from a trial of labor (among patients with a prior CS) are coded to 674.1 rather than 665.0/1. Furthermore, only about half of the patients coded as having UR (665.0/1) actually had this complication. As an editorial note accompanying the study point out, these coding problems may reflect the fact that the development of the ICD-9 system predates the interest in monitoring trends in the incidence of UR associated with increased VBAC rates. As ICD-10 codes were also published prior to increased concern over UR, administrativedata will have limited use in monitoring the potential negative impacts of attempted VBAC.

Construct validity

The likelihood that a patient will undergo VBAC correlates with certain provider and institutional variables,354, 361, 433, 438, 439 suggesting that certain providers are more likely to adapt to changes in policy or technology.

Fosters true quality improvement

Promotion of VBAC a QI has led to successful increases in the VBAC rate in some cases,440, 441 but not in others. 442 The major opportunity for gaming (i.e., spurious improvement) lies in focusing on patients more likely to successfully complete a trial of labor after previous CS.

Prior use

VBAC is one of the most commonly used indicators. In addition to being included in the previous version of the HCUP I indicator set, VBAC is included in and used by JCAHO's IM System, Maryland QI Project, 369 Michigan Hospital Association, 373 HealthGrades.com, 377 Cleveland Health Quality Choice, 374 and the University Hospital Consortium. 370 Further, JCAHO has selected VBAC as one of its core measures. 443

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation33.6%, 14.8% 
   Systematic provider-level standard deviation*11.7%Very High
   Provider variation as a percentage of total variation*5.7%High
   Signal ratio*83.1%High
   R-Square*89.5%High
   * age- adjusted
Minimum Bias - age risk adjustment
   Signal variance change with risk adjustmentNo changeVery Good
   Absolute impact:
     Average absolute change (in %)3.1%Very Good
   Relative impact:
     Rank correlation0.995Very Good
     Percent remaining in high decile/low decile93.1%/94.3%V.G./Good
     Percent changing more than 2 deciles0.0%Very Good
Precision

This indicator is very precise, with a raw provider level mean of 33.6% and a substantial standard deviation of 14.8%. The systematic provider level standard deviation is very high, at 11.7%. The provider level variation accounts for a high percentage of total variation, at 5.7%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level, though some variation remains at the discharge level. Finally, the signal ratio is high, at 83.1%. This means that it is likely that the observed differences in provider performance represent true differences in provider performance, although some of the observed difference is due to unobserved differences in patient characteristics. The high R-Square indicates that some additional signal can be extracted using multivariate techniques, though this additional impact is moderate.

Bias

Signal variance does not change with risk adjustment. We did not perform APR-DRG risk adjustment because the categories correspond to the outcome of interest. As a result, the indicator performs very well on the multiple measures of minimum bias, adjusting only for the age of the mother. The rank correlation is very high at 0.995. Age risk adjustment does not seem to impact disproportionately at the extreme high and low end, though some there is some modest impact at the low extreme relative to other indicators. The absolute change is low, and no providers change more than two deciles with risk adjustment.

Construct validity

VBAC rate loads very highly on factor three, and is inversely related to cesarean section delivery and to a lesser extent incidental appendectomy.

Discussion

Vaginal birth after cesarean section (VBAC) has been implicated as an underused procedure. Healthy People 2010 established a goal of increasing VBAC rates, indirectly by establishing a goal of decreasing cesarean sections in women with previous cesarean section to 63%.

Our empirical tests demonstrated that this indicator is measured with very good precision. Given the high signal ratio, it is likely that the observed differences represent true differences in provider performance, rather than random variation. This cannot fully account for systematic bias in the indicator however. Multivariate smoothing techniques appear to increase modestly the amount of signal that can be extracted. While this amount is high, it remains slightly lower than other indicators.

While we found no change in provider ranking with risk adjustment, this analysis only accounted for the age of the mother, as there are no severity classifications within the APR-DRG for VBAC. The literature review revealed that some clinical factors may contraindicate this procedure, and thus should be risk adjusted for. It is unlikely that these clinical diagnoses, such as previous classic cesarean section, would be available in administrative data. Some clinical information may be available on birth records, and thus linkage to such vital records may provide for better risk adjustment. Further, administrative data does not capture such information as informed preference of the patient or physician in electing repeated cesarean section. Hospitals may vary systematically in the presence of these factors, though this has not been widely studied. Providers may wish to examine case mix to illuminate any potential biases.

We located no other studies that examined the construct validity of VBAC. However, we found that c-section and VBAC are strongly negatively correlated, as one would expect, as increasing the VBAC rate was proposed as one means of reducing c-section rate. However, a simple correlation does not imply causation, nor guarantee that this relationship is evidence of an underlying quality relationship.

Excessive rates of VBAC may increase rates of uterine rupture. Yet, the best rate for VBAC has not been established. We suggest that this measure be used in conjunction with area rates and national rates, and complication rates (maternal uterine rupture and length of stay, neonatal length of stay) to assess whether ones rate is truly too high or too low.

This indicator may indeed encourage better documentation of previous cesarean sections. However, as current evidence indicates that currently previous cesarean section is undercoded, providers with low rates of cesarean section should investigate whether there are substantial undercoding problems.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 19 out of 26. This indicator is recommended with several caveats of use. First, selection bias due to preferences of patients and other factors may impact performance on this indicator. Second, potential additional bias may result from clinical differences not identifiable in administrative data, so supplemental risk adjustment with linked birth records or other clinical data may be desirable. Third, as a utilization indicator, the construct validity relies on the actual appropriate use of procedures in hospitals with high rates, and this should be investigated further. Finally, this caution should be maintained for VBAC rates that are drastically below or above the average or recommended rates.

INDICATOR 12: LAPAROSCOPIC CHOLECYSTECTOMY RATE

IndicatorLaparoscopic cholecystectomy rate
Relationship to QualityLC is a new technology with lower risks than open cholecystectomy (removal of the gall bladder). As such a higher rate of LC represents better quality care.
BenchmarkState, regional, peer group average.

Method:

Quality MeasureProvider level, Number of laparoscopic cholecystectomies per 100 cholecystectomies
Outcome of InterestNumber of laparoscopic cholecystectomies (see Appendix 6.
Population at RiskAll non-maternal/non-neonatal discharges age 18 years or older with any procedure code for cholecystectomy (see Appendix 6) in any field.

Include only discharges with uncomplicated cases: cholecystitis and/or cholelithiasis (see Appendix 6) in any diagnosis field.

Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates)
Evidence from the literature
Face validity

Cholecystectomy, surgical removal of the gallbladder, is now generally performed with a laparoscope (in about 75% of uncomplicated cases in the NIS). 444 One of the largest randomized controlled trials (RCTs) comparing cholecystectomy by laparoscopy versus minilaparotomy found that the laparoscopic procedure was associated with less postoperative pain, lower patient-controlled morphine consumption, better postoperative pulmonary function and oxygen saturation, and quicker return to leisure activities, work in the home, and social activities (but no difference in return to employment).445-451 This study also confirmed that the laparoscopic approach is associated with less postoperative narcotic use, less sick leave, and shorter convalescence (even with conversion rates as high as 13%). Observational studies suggest that laparoscopic surgery also has a lower mortality rate452-456 and a lower readmission rate.452-454 Only one large RCT reported no difference in hospital stay, time off work, or return to full activities. 457 The authors attribute this difference to their small-incision (7 cm) minilaparotomy and their unusual blinding method, in which patients were told to start eating, get out of bed, and go home whenever they felt ready, without knowing which procedure they had received.

As a result of these studies, laparoscopic cholecystectomy is now widely accepted as the "standard of care for patients requiring cholecystectomy," 458 in the absence of specific contraindications (e.g., coagulopathy, late pregnancy, morbid obesity, cirrhosis). However, this procedure requires more technical skill than the open approach. Thus, a higher rate of laparoscopic cholecystectomy (as a proportion of all cholecystectomies) suggests that a hospital can rapidly achieve proficiency in up-to-date treatment methods.

Precision

Cholecystectomies are relatively common, so moderately precise estimates of differences in laparoscopic utilization across hospitals can be obtained. In the NIS, the average number of cholecystectomies per hospital is approximately 70, with nearly two-thirds of all hospitals performing at least 3 procedures. Still, random variation in laparoscopic utilization across hospitals in a particular year may be considerable, particularly for hospitals that perform few procedures. Restricting the denominator definition to uncomplicated cases reduces the size of the denominator by about 20%, which further reduces precision. 459

Minimum bias

The current HCUP I definition limits the denominator to uncomplicated cases: non-acute cholecystitis (inflammation of the gallbladder) and/or cholelithiasis (formation of bile stones in the gallbladder). For these patients, cholecystectomy is an elective procedure. Cholecystectomies on patients with acute cholecystitis are performed as emergency procedures, which increases the risk of iatrogenic injury. 460 Higher risks of complications with LC are also associated with older age and the presence of common bile duct stones. 459 Only a limited number of surgeons are comfortable with laparoscopic common bile duct exploration, which is described by the Society of American Gastrointestinal Endoscopic Surgeons as "a complex biliary procedure that demands a well-trained operating room team and facilities and equipment beyond that required for routine LC." In addition, as surgeons become more experienced in LC, they are more likely to perform LC on more difficult patients, such as those with acute cholecystitis 461 and those at older and younger ages.462, 463 These examples illustrate that patient referral patterns and other selection factors may lead to substantial differences in laparoscopy rates (as a proportion of all cholecystectomies) across hospitals. While many of these patient characteristics can be measured and thus adjusted out in comparisons across hospitals, controlling for other aspects of patient severity may be more difficult.

In 1996, there were an estimated 770,000 cholecystectomies, both open and laparoscopic, in the US - 322,000 ambulatory (42%) and 448,000 inpatients (58%). 297 Thus, use of only inpatient data could be substantially biasing, in that it eliminates those cholecystectomies performed on an outpatient basis, most of which are likely to be laparoscopic.

Construct validity

We found no studies explicitly evaluating the construct validity of this indicator. In other words, there is no evidence that hospitals that use the laparoscopic approach more frequently provide better quality of care, according to other measures. One study examined factors associated with the rate of hospital adoption of laparoscopic cholecystectomy in Pennsylvania; the authors reported that "nearly universal adoption occurred at a speed not previously reported" (e.g., within 2 years). Participation in residency training was the only independent predictor of earlier LC adoption.

Fosters true quality improvement

This indicator raises several concerns related to perverse incentives. Within 3 years after its introduction, 76% of cholecystectomies in Maryland462, 463 and 72% of those in Pennsylvania 464 were performed laparoscopically. In both states, and in New York, 465 the advent of laparoscopic surgery led to a substantial (28-34%) increase in the overall cholecystectomy rate, especially involving uncomplicated and elective patients.466-468 Similar, but less dramatic, increases have been reported from the Veterans Affairs system (10%), 456 Canada (17%), Australia (26%), 469 and Scotland (19%). 470 These trends eliminated most of the expected decline in cholecystectomy-associated deaths in Maryland and in the Veterans Affairs system. The NIH Consensus Development Panel tried to slow this phenomenon by declaring that "the availability of laparoscopic cholecystectomy should not expand the indications for gallbladder removal." 466 It is not clear whether this declaration has had any moderating effect on community practice. However, providers could readily boost LC rates by recruiting more persons with marginal clinical indications to undergo cholecystectomy.

The other concern in this domain is that the "optimal" LC rate has not been defined. Provider experience may be an important, and desirable, determinant of how the procedure is used across a wide range of clinical situations. At some hospitals, increasing proportionate LC utilization above 75% might lead to more biliary tract or intestinal complications. In addition, previous studies have clearly demonstrated a learning curve, whereby surgeons' outcomes and operating times improve as they gain experience with LC.471, 472 Technical complication rates appear to stabilize at a low rate after about 75 procedures. 473 Incentives to increase LC utilization may have negative consequences if local physicians lack appropriate training and expertise.

These findings suggest that proportionate LC utilization must be interpreted together with overall cholecystectomy rates (to rule out over-referral of patients who are inappropriate or marginally appropriate) and LC complication or conversion rates (to rule out inappropriate use of LC among patients who would benefit from open surgery).

Prior use

The rate of laparoscopic cholecystectomy is a current indicator in the HCUP I QI set. We were unable to find evidence that this measure has been used as a quality indicator in other settings. Indeed, the rapid and nearly universal adoption of LC that occurred between 1990 and 1993 makes future implementation of this measure seem unlikely.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation66.2%, 19.2% 
   Systematic provider-level standard deviation*13.3%Very High
   Provider variation as a percentage of total variation*7.9%Very High
   Signal ratio*83.1%High
   R- Square*89.1%High
  * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)3.9%Very Good
   Relative impact:
     Rank correlation0.966Very Good
     Percent remaining in high decile/low decile93.8%/78.8%V.G./Fair
     Percent changing more than 2 deciles2.5%Very Good
Precision

This indicator is very precise, with a raw provider level mean of 66.2% and a substantial standard deviation of 19.2%. The systematic provider level standard deviation is very high, at 13.3%. The provider level variation also accounts for a very high percentage of total variation, at 7.9%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level. Finally, the signal ratio is high, at 89.1%. This means that it is likely that the observed differences in provider performance represent true differences in provider performance, although some of the observed differences are due to unobserved differences in patient characteristics. The high R-square demonstrates that a large proportion of signal can be extracted using multivariate techniques, though not as much as some other indicators. The additional impact of multivariate techniques is modest.

Bias

Signal variance does not change with risk adjustment. We did not perform APR-DRG risk adjustment because the categories correspond to the outcome of interest. As a result, the indicator performs very well on the multiple measures of minimum bias, using age-sex adjustment. The rank correlation is high at 0.966. Age and sex risk adjustment does seem to disproportionately impact the low extreme relative to the high extreme of the distribution; ninety-four percent of providers in the high decile remain after risk adjustment, while only 78.8 of providers in the low decile remain. The absolute change is low, and relatively few providers change more than two deciles with risk adjustment.

Construct validity

Laparoscopic cholecystectomy loads very highly on factor two. It is inversely related to bilateral catheterization, and CABG mortality rate.

Discussion

Use of laparoscopic cholecystectomy as opposed to open cholecystectomy is associated with less morbidity in less severe cases. Thus, laparoscopic cholecystectomy has been identified as a potentially underused procedure, when measured as a ratio to total cholecystectomies performed.

The literature review suggests that due to the rapid adoption of LC, there would be very little variation between providers. In contrast, this indicator has a high percentage of variation attributable to providers. This indicates that while many hospitals do use LC, the proportion of LC to all cholecystectomies does in actuality vary between providers.

This indicator performed very well on the empirical test for minimum bias, showing very minimal bias, when adjusting for age and sex. The exception is the impact of providers with low rates; this bias could lead to the misidentification of providers as outliers when in fact they are not. Additional bias was not examined, as there are no severity classifications within the cholecystectomy APR-DRG. However, the literature review does indicate that there may be need to adjust for clinical severity, age, and other factors, since LC may be contraindicated for some patients, and others may not be clinically severe enough to qualify for cholecystectomy at all. The level of clinical detail required may not be available using HCUP data. There is concern that encouraging too many cholecystectomies to be performed laparoscopically could lead to higher complication and conversion to open cholecystectomy rates. Further too many inappropriate procedures in patients without appropriate clinical indications would artificially inflate the laparoscopic cholecystectomy rate without improving quality.

The most troubling source of bias is that up to half of all cholecystectomies are performed on an outpatient basis. This bias decreased the strength of this indicator substantially. Providers should incorporate outpatient data if possible when interpreting this indicator.

While this indicator performed well empirically, the literature review demonstrates some concerns regarding the interpretation and use of this measure. Several steps could be taken when using this measure to ensure appropriate interpretations. Providers should compare rates with area rates. Providers with substantially higher rates than those of providers in the same area may want to assess the appropriateness of LC procedures. Providers with substantially lower rates may want to examine the clinical severity of their patients to assess whether the low rate is due to a more severe case mix, requiring more open cholecystectomies.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 20 out of 26. This indicator is recommended with several major caveats of use. First, many laparoscopic cholecystectomies are performed on an outpatient basis, and thus may bias this indicator. Second, additional bias may result from clinical differences not identifiable in administrative data, so supplemental risk adjustment using other clinical data may be desirable. Third, as a utilization indicator, the construct validity relies on the actual appropriate use of procedures in hospitals with high rates, and this should be investigated further. Fourth, providers may inflate the laparoscopic cholecystectomy rate by increasing the procedure rate for patients with questionable indications. Finally, caution should be maintained for laparoscopic rates that are drastically below or above the average rates.

3.E.3. Area-level utilization measures

INDICATOR 13: CORONARY ARTERY BYPASS GRAFT (CABG) RATE

IndicatorArea level coronary artery bypass graft (CABG) rate
Relationship to QualityCABG is an elective procedure that may be overused. As such more average rates would represent better quality.
BenchmarkState, regional or peer group average.

Method:

Quality MeasureNumber of CABGs per 100,000 population.
Outcome of InterestNumber of CABGs (any procedure field) per 100,000 population (see Appendix 6).

Age 40 years or older.

Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 40 years and over.
Evidence from the literature
Face validity

Coronary artery bypass graft (CABG) surgery is performed on patients with coronary artery disease (CAD). Most previous studies of small area variation have found relatively high variation in CABG rates, as noted by the systematic component of variation (.758). 474 This systematic component of variation (SCV) "compares geographic variability between DRGs after removing random effects. 474 This variation is not been explained by population characteristics such as age and race.

The clinical indications for CABG in patients with symptoms less major than three-vessel disease, previous myocardial infarction, or less than strongly positive exercise ECG tests are unclear.262, 475 No randomized controlled trials have demonstrated that CABG improves clinical outcomes in patients with some combination of these indications.

Precision

Because adult admissions for CABG are relatively common, it should be possible to generate precise estimates of utilization at the area level. However, random variation in utilization rates may become more problematic for relatively small areas (e.g., zip codes) or underpopulated areas (e.g., rural counties).

Minimum bias

Utilization rates standardized at the area level (e.g., adult population of the county or SMSA) may be biased by differences in the prevalence of CAD. The prevalence of CAD may, in turn, be related to the age structure of the population and the prevalence of behavioral or physiologic risk factors such as smoking and hyperlipidemia. Even though race and demographic factors have significant effects on the likelihood of CABG, previous studies have shown that sociodemographic differences account for very little of the observed variation in CABG rates. 262 While one study reports no significant difference between age categories and appropriateness, 262 another reports that "patients 75 years of age and older were more likely to have surgery classified as uncertain than were younger patients." 475 Although there is some report of variation based on age, it is not likely that this factorexplains all the variation observed in CABG rates. Some differences in CABG rates across areas may be attributable to the referral of rural and other patients from outside the area for surgery; however, such referrals are unlikely to explain a large part of the substantial differences in rates across small geographic areas.

Construct validity

For this indicator to perform well in identifying true quality of care problems, there must be evidence of significant inappropriate or questionable utilization in population-based studies. In addition, there must be substantial variation in the rate of inappropriate utilization across providers or small areas. Previous studies have classified possible indications for CABG as appropriate, uncertain, or inappropriate, as a method for assessing quality of care.

In a follow-up of a NY appropriateness study, a panel of cardiologists from Duke University reviewed 308 cases. In their results, the rate of inappropriate procedure increased to 6% and the rate of uncertain procedures increased to 12%. 476 In another study of 12 Academic Medical Center Consortium hospitals, the rate of CABG for inappropriate indications was as much as 1.9% overall, ranging from 0 to 5% across hospitals (P=0.02). The rate of CABG for uncertain indications was 7%, ranging from 5 to 8% across hospitals. 475 In 15 randomly selected non-federal hospitals in New York State (which has relatively low CABG rates by US standards), 2.4% of CABG was inappropriate and 7% uncertain. These rates also varied among hospitals. 262

In 319 randomly selected patients with CABG between July 1987 and June 1988, 16% overall received CABG for inappropriate reasons and 26% received the procedure for uncertain reasons. 317

In a comparison of Canadian and New York State samples, appropriateness of CABG was measured according to criteria from each nation. Based on both Canadian and US criteria, the Canadian sample held more cases of CABG for uncertain indications. However, based on Canadian criteria, the Canadian sample had fewer cases of CABG performed for inappropriate indications but, based on US criteria, the rate of inappropriate CABG in both samples is lower than that Canadian criteria assess for either of them, and the rates are approximately the same. 477

In a study of seven of eight public Swedish heart centers, which perform 92% of all bypass surgeries in Sweden, it was found that 9.7% of all CABG procedures were performed for inappropriate indications and 12.3% of procedures were performed for uncertain indications. 478 Finally, among 153 catheterization patients referred to either a university cardiac laboratory or a VA cardiac lab in Maryland, the rate of CABG performed for inappropriate indications ranged from 17 to 46%, based on three different appropriateness criteria: RAND, ACC/AHA, or RAS. The rate of CABG for uncertain indications was 17%, based on RAND criteria, the only set of criteria that accounts for this rating.

Thus, though most studies have found relatively low rates of inappropriate use, there is some evidence of variation in inappropriate rates across geographic areas. Moreover, a larger proportion of bypass surgery procedures are performed for indications in which benefits are uncertain; procedure rates for uncertain indications may also vary substantially across hospitals and areas.

Fosters true quality improvement

Little evidence exists on whether the use of CABG as a quality indicator might differentially reduce procedures which are inappropriate or of unclear benefit, rather than appropriate procedures.

Prior use

The hospital-based rate of CABG is a current indicator in the HCUP I QI set. The area-based rate of CABG is a current indicator in the Dartmouth Atlas. 479

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation180.4, 571.6 
    Systematic area-level standard deviation*0.25%Very high
    Area variation as a percentage of total variation*0.21%Very high
    Signal ratio*97.3%Very high
    R-Square*97.3%Very high
    * age- and gender- adjusted
Minimum Bias - age and sex risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
     Average absolute change (in %)12.6%Good
    Relative impact:
      Rank correlation0.654Fair
      Percent remaining in high decile/low decile36.4% / 95.5%Fair/ V.G.
      Percent changing more than 2 deciles35.0%Fair
Precision

This indicator is moderately precise, with a raw area level mean is 180.4 per 100,000 population and a standard deviation of 571.6. The systematic area level standard deviation is very high, at 0.25%. The area level variation accounts for a very high percentage of total variation, at 0.21%. This means that relative to other indicators, a larger percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is very high, at 97.3%. This means that it is very likely that the observed differences in area performance represent true differences in area performance. The R-Square is also very high, demonstrating that a large proportion of signal can be extracted with either univariate or multivariate techniques.

Bias

Signal variance does not change substantially with risk adjustment. The indicator performed relatively poorly on most measures of minimum bias, using age and sex adjustment. Both the relative and absolute impact of risk adjustment is substantial. The exception is that risk adjustment did not effect the lowest decile substantially, with over 95% of providers in the lowest decile remaining after risk adjustment.

Construct validity

CABG rate loads on factor 1, and is related to all other area utilization indicators.

Discussion

CABG was included in the HCUP I QIs as a provider level indicator and a potentially overused procedure. In this report this indicator has been redefined as an area-level indicator. Substantial and systematic small area variation has been noted in the literature. This variation is not explained by sociodemographic characteristics.

This indicator is measured with very high precision. The systematic area-level variation is very high, and accounts for a very high percentage of the total variation. The very high signal ratio suggests that observed variation between areas is likely to reflect true variation in performance. Multivariate techniques do not extract additional signal, but it does not appear to be necessary. Univariate smoothing, as always, is recommended.

In our empirical analysis of minimum bias, risk adjustment by age and sex affected performance of areas substantially. Some clinical factors were also noted in our literature review are appropriate indications for CABG, such as significant coronary artery disease. These factors may be more prevalent in an area with more risk factors, such as smoking, hyperlipidemia, or older age. As such, risk adjustment with demographic data, at minimum, is recommended.

The ideal rate for CABG has not been established and indeed there are cases where CABG is an appropriate and necessary procedure. Several studies have noted that CABG is not often performed for inappropriate indications (under 15%). As such, this indicator is not recommended as a stand alone quality indicator. Rather it is designed for use with volume and mortality indicators. Methods of evaluating the appropriateness of procedures have been established in the literature cited in this report and by other sources. These methods could be used to evaluate the appropriateness of procedures within an area, if this remains a concern.

Area rates are based on the rates for hospitals within an area, and as such do not take into account that some patients are referred into area hospitals from a different area. Examination of data containing patient residence may aid in identifying the extent to which patients are referred into an area. HCUP data may also be used to examine which hospitals contribute the most to the overall area rate.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, inasmuch as it is used in conjunction with other indicators. It received an empirical rating of 19 out of 26. This indicator is recommended with several major caveats of use. As an area utilization indicator, this indicator is a proxy for actual quality problems. This indicator in particular has unclear construct validity, as CABG does not appear to be performed inappropriately often. Finally, caution should be maintained for CABG that are drastically below or above the average or recommended rates.

INDICATOR 14: HYSTERECTOMY RATE

IndicatorArea level hysterectomy rate
Relationship to QualityHysterectomy has been identified as a potentially overused procedure. As such, more average rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of hysterectomies per 100,000 population.
Outcome of InterestNumber of hysterectomies (any procedure field) per 100,000 population (see Appendix 6).

Females age 18 years and older.

Exclude discharges with diagnosis for genital cancer, or pelvic or lower abdominal trauma in any diagnosis field.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskFemale population in MSA or county, age 18 years and older.
Evidence from the literature
Face validity

Hysterectomy is performed on patients with a number of indications, such as recurrent uterine bleeding, chronic pelvic pain, or menopause, usually in some combination. Small area variation has been noted. One study of variation within the state of Maryland found relatively moderate variation, as noted by the systematic component of variation (.083). 474 This systematic component of variation (SCV) "compares geographic variability between DRGs (diagnosis-related groups) after removing random effects." 474

The clinical indications for hysterectomy that include persistent or recurrent abnormal bleeding, pain, an adnexal mass, limited hormonal therapy, and premenopausal age are unclear. No randomized controlled trials have demonstrated that hysterectomy improves clinical outcomes in patients with uncertain clinical indications.

Precision

Because adult admissions for hysterectomy are relatively common, it should be possible to generate precise estimates of utilization at the area level. However, random variation in utilization rates may become more problematic for relatively small areas (e.g., zip codes) or underpopulated areas (e.g., rural counties).

Minimum bias

Utilization rates standardized at the hospital level (e.g., all adult discharges) were included in earlier versions of the HCUP I QIs, but are likely to be biased by local market characteristics and referral patterns. In other words, hospitals that specialize in specific services, such as elective or gynecological surgeries, may appear to have higher utilization rates than hospitals that refer such patients elsewhere. Theoretically, utilization rates standardized at the area level (e.g., adult population of the county or SMSA) may be biased by differences in the prevalence of those indications that warrant the procedure. The prevalence of these indications may, in turn, be related to the age structure of the population and the prevalence of behavioral or physiologic risk factors. Previous studies have shown that observed variation in hysterectomy rates may be accounted for by sociodemographic differences in the patient population. In a study of seven managed care organizations, "older women were more likely than younger women to have received a hysterectomy for appropriate reasons." 263 In a study of women in the UK, "the risk of hysterectomy was significantly related to parity." 480 "Only 5% percent of nulliparous women had a hysterectomy; the risk for women with one to four children varied between 8% and 11%, and the risk for those with five or more children was 31% (P=0.002)." 480 Even after adjusting for parity in a regression model, "the risk of hysterectomy was still significantly associated with educational qualifications." 480

Construct validity

For this indicator to perform well in identifying true quality of care problems, there must be evidence of significant inappropriate utilization in population-based studies. In addition, there must be substantial variation in the rate of inappropriate utilization across providers or small areas. Previous studies have classified possible indications for hysterectomy as appropriate, uncertain, or inappropriate, as a method for assessing quality of care.

In a random sample of 642 hysterectomies (non-emergency and non-oncological), 16% of procedures were deemed inappropriate, based on patient indications, while 25% were deemed uncertain. The rate of inappropriate indications for hysterectomy varied across plans from 10% to 27%. 263

Another study focused on women receiving hysterectomies in Southern California. The rate of overall inappropriate indications was 70%, varying from 45% to 100% across diagnoses indicative of hysterectomy. Uncertain indications were not evaluated. 481

Fosters true quality improvement

In theory, use of this quality indicator might reduce appropriate as well as inappropriate hysterectomies. Little evidence exists on whether this is likely to occur, or on the extent to which overall hysterectomy rates are correlated with inappropriate hysterectomy rates.

Prior use

The hospital-based rate of hysterectomy is a current indicator in the HCUP I QI set. The area-based rate of hysterectomy is a current indicator in the Dartmouth Atlas. 479

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation419.4, 323.3 
    Systematic area-level standard deviation*0.19%Very High
    Area variation as a percentage of total variation*0.10%High
    Signal ratio*93.6%Very High
    R-Square*93.7%Very High
    * age adjusted
Minimum Bias - Age only risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
      Average absolute change (in %)3.0%Very Good
    Relative impact:
      Rank correlation0.996Very Good
     Percent remaining in high decile/low decile81.8% / 90.9%Good
      Percent changing more than 2 deciles0.00%Very Good
Precision

This indicator is precise, with a raw area level rate of 419.4 per 100,000 population and a substantial standard deviation of 323.3. The systematic area level standard deviation is very high, at 0.19%. The area level variation accounts for a high percentage of total variation, at 0.10%. This means that relative to other area indicators, a higher percentage of the variation occurs at the area level, rather than the discharge level, though some remains at the discharge level. The signal ratio is very high, at 93.6%. This means that it is likely that the observed differences in area performance represent true differences in provider performance. The very high R-square represents the high proportion of signal that can be extracted using multivariate techniques. However, multivariate techniques have little additional impact on the amount of variance that can be extracted from this indicator, since the signal ratio is already very high.

Bias

Signal variance does not change substantially with risk adjustment. The indicator performed well on most measures of minimum bias, though the only adjustment was for age. Age adjustment does not impact the absolute ranking substantially. No providers change more than 2 deciles in performance. Risk adjustment appears not to impact the relative rankings of areas substantially.

Construct validity

Hysterectomy loads on factor 1, and is related to all other area utilization indicators.

Discussion

Hysterectomy has been proposed as a potentially overused procedure. In the HCUP I QIs this indicator was defined with a provider based denominator. It has been redefined in this report as an area-level indicator. Population rates of hysterectomy have been shown to vary systematically by small geographic area, and this variance cannot be explained by systematic clinical or demographic factors. However, patient and physician preference may also play a role in the choice to have a hysterectomy, and thus may affect area rates.

This indicator is measured with good precision. The area-level systematic variation is very high, and accounts for a high percentage of the total variation. The very high signal ratio suggests that observed variation between areas is likely to reflect true variation in performance. Multivariate techniques do not appear to have substantial additional impact on the ability to extract signal from this indicator, though the signal ratio is already very high. Univariate smoothing, as always, is recommended.

In our empirical analysis of minimum bias, risk adjustment with age did not impact area performance substantially. Some clinical factors were noted in our literature review are appropriate indications for hysterectomy. However, it is unlikely that these indications, with the exception of parity, would vary systematically by area, and thus we would not expect this indicator to be substantially biased. Risk adjustment with age is recommended.

The ideal rate for hysterectomy has not been established and indeed there are cases where hysterectomy is an appropriate and necessary procedure. However, several studies have noted relatively high rates of inappropriate indications for surgery (16%-70%). Methods of evaluating the appropriateness of procedures have been established in the literature cited in this report and by other sources. These methods could be used to evaluate the appropriateness of procedures within an area.

Area rates are based on the rates for hospitals within an area, and as such do not take into account that some patients are referred into area hospitals from a different area. Examination of data containing patient residence may aid in identifying the extent to which patients are referred into an area. HCUP data may also be used to examine which hospitals contribute the most to the overall area rate.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 22 out of 26. This indicator is recommended with several major caveats of use. As an area utilization indicator, this indicator is a proxy for actual quality problems. This indicator has unclear construct validity, as high utilization of hysterectomy has not been shown to necessarily be associated with higher rates of inappropriate utilization. As only age adjustment is available using administrative data, additional clinical risk adjustment, such as for parity, may be desirable. Finally, caution should be maintained for hysterectomy rates that are drastically below or above the average or recommended rates.

INDICATOR 15: LAMINECTOMY AND/OR SPINAL FUSION RATE

IndicatorArea level laminectomy and/or spinal fusion rate.
Relationship to QualityLaminectomy has been identified as a potentially overused procedure. As such, more average rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of laminectomies and/or spinal fusions per 100,000 population.
Outcome of InterestNumber of laminectomies and/or spinal fusions (any procedure field) per 100,000 population (see Appendix 6).

Age 18 years and older.

Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18 years and over.
Evidence from the literature
Face validity

Laminectomy is performed on patients with a herniated disc or spinal stenosis. Most previous studies of small area variation have found relatively high variation in laminectomy rates, as noted by the systematic component of variation (.292). 474 This systematic component of variation (SCV) "compares geographic variability between DRGs [diagnosis-related groups] after removing random effects. 474 Another study in Maryland found a 0.50 variation rate [(sample standard deviation)/(sample mean)] in laminectomy for selected Maryland service areas. 482 A study of Boston-area hospitals found a 2.2-fold variation in laminectomy rates among districts. 483 Larequi-Lauber et al. report that, in the United States, "the use of back surgery varies from one area to another by as mush as 15-fold." 484 This high variation has not been explained by population characteristics such as age and socioeconomic status.

The clinical indications for laminectomy in patients with minor neurological findings, lengthy restricted activity, and equivocal imaging for discal hernia or spinal stenosis are unclear. No randomized controlled trials have demonstrated that laminectomy improves clinical outcomes in patients with these uncertain indications.

Precision

Because adult admissions for laminectomy are relatively common, it should be possible to generate precise estimates of utilization at the area level. Indeed, even after accounting for random variations, large differences in area laminectomy rates remain. 474 However, random variation in utilization rates may become more problematic for relatively small areas (e.g., zip codes) or sparsely populated areas (e.g., rural counties).

Minimum bias

Utilization rates standardized at the area level (e.g., adult population of the county or SMSA) may be biased by differences in the prevalence of herniated disc or spinal stenosis. The prevalence of herniated disc or spinal stenosis may, in turn, be related to the age structure of the population and the prevalence of behavioral or physiologic risk factors. However, previous studies have shown that sociodemographic differences and other measurable population characteristics account for very little or none of the observed variation in laminectomy rates. 482

Construct validity

For this indicator to perform well in identifying true quality of care problems, there must be evidence of significant inappropriate utilization in population-based studies. In addition, there must be substantial variation in the rate of inappropriate utilization across providers or small areas. Previous studies have classified possible indications for laminectomy as appropriate, uncertain, or inappropriate, as a method for assessing quality of care.

In an assessment of cases at one Swiss hospital, 23% of patients received surgical treatment for herniated discs for inappropriate reasons and 29% received surgical treatment for uncertain indications. 485 In another study of teaching hospital patients undergoing surgery for herniated disc or spinal stenosis (lumbar discectomy or spinal stenosis surgery), 38% of surgeries were performed for inappropriate indications. Uncertain indications were combined with "appropriate" category.

Fosters true quality improvement

Little evidence exists on whether use of area laminectomy rates as a quality indicator would lead to less performance of laminectomies for inappropriate or uncertain indications, without reducing the use of laminectomy for appropriate indications.

Prior use

The hospital-based rate of laminectomy is a current indicator in the HCUP I QI set. The area-based rate of laminectomy is a current indicator in the Dartmouth Atlas. 479

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation139.0, 347.5 
    Systematic area-level standard deviation*0.13%High
    Area variation as a percentage of total variation*0.10%Very High
    Signal ratio*96.7%Very High
    R-Square*96.7%Very High
    * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
     Average absolute change (in %)6.3%Very Good
    Relative impact:
     Rank correlation0.933Good
     Percent remaining in high decile/low decile31.8% / 95.5%Fair/ Good
      Percent changing more than 2 deciles7.4%Good
Precision

This indicator is moderately precise, with a raw area level mean of 139.0 per 100,000 population and a standard deviation of 347.5. The systematic area level standard deviation is high, at 0.13%. The area level variation accounts for a very high percentage of total variation, at 0.10%. The signal ratio is very high, at 96.7%. This means that it is very likely that the observed differences in area performance represent true differences in area performance. Multivariate smoothing does not have additional impact in extracting signal, mainly due to the already very high ratio.

Bias

The signal variance does not change with age-sex risk adjustment. The indicator performed fairly to well on most measures of minimum bias. Risk adjustment does not impact the absolute performance of areas substantially (6.3%). Only 7.4% of areas changed more than 2 deciles. Risk adjustment appears to impact the highest decile disproportionately to the lowest decile, with 31.8 percent of areas in the highest decile remaining after risk adjustment, compared to 95.5% in the lowest decile. However, the overall relative impact of risk adjustment was moderate, with a high rank correlation of .933.

Construct validity

Laminectomy rate loads on factor 1, and is related to all other area utilization indicators.

Discussion

Laminectomy has been proposed as a potentially overused procedure. This indicator was defined with a provider level denominator in the HCUP I QIs. This report recommends its use as an area level indicator. Laminectomy has been shown to vary widely and systematically between areas. Sociodemographic or clinical factors do not explain this variation. However, patient and physician preference may also play a role in the choice to have a laminectomy, which may in turn affect area rates.

This indicator is measured with high precision. The area-level systematic variation is high, and accounts for a very high percentage of the total variation. The very high signal ratio suggests that observed variation between areas is likely to reflect true variation in performance.

In our empirical analysis of minimum bias, risk adjustment by age and sex only minimally affected the performance of providers, suggesting that performance is not highly influence by the demographic breakdown of the population. In addition, risk adjustment appears to affect areas with the highest rates substantially. This means that without adequate risk adjustment areas may be mislabeled as outliers when in fact they are not. Some clinical factors were noted in our literature review are appropriate indications for laminectomy. However, it is unlikely that these indications would vary systematically by area, and thus we would not expect this indicator to be substantially biased. Risk adjustment for age and sex is recommended.

The ideal rate for laminectomy has not been established and indeed there are cases where laminectomy is an appropriate and necessary procedure. However, several studies have noted relatively high rates of inappropriate procedures (23%-38%). Methods of evaluating the appropriateness of procedures have been established in the literature cited in this report and by other sources. These methods could be used to evaluate the appropriateness of procedures within an area.

High area rates are based on the rates for hospitals within an area, and as such do not take into account that some patients are referred into an area hospitals from a different area. Examination of data with patient residence can aid in illuminating the extent to which patients are referred into an area. HCUP data may be used to examine which hospitals contribute the most to the overall area rate.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 20 out of 26. This indicator is recommended with several major caveats of use. As an area utilization indicator, this indicator is a proxy for actual quality problems. This indicator has unclear construct validity, as high utilization of laminectomy has not been shown to necessarily be associated with higher rates of inappropriate utilization. Finally, caution should be maintained for laminectomy rates that are drastically below or above the average or recommended rates.

INDICATOR 16: PERCUTANEOUS TRANSLUMINAL CORONARY ANGIOPLASTY (PTCA) RATE

IndicatorArea level PTCA rate.
Relationship to QualityPTCA has been identified as a potentially overused procedure. As such, more average rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of PTCA procedures per 100,000 population.
Outcome of InterestNumber of PTCA procedures (any procedure field) per 100,000 population (see Appendix 6).

Age 40 years and over.

Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 40 years and over.
Evidence from the literature
Face validity

Percutaneous transluminal coronary angioplasty (PTCA) is performed on patients with coronary artery disease (CAD). Previous studies of small area variation have found substantial variation in PTCA rates, though most analyses have been performed on Medicare data. 486

The clinical benefit of PTCA in many patients with unstable or chronic angina and recent myocardial infarction (MI) is unclear. No randomized controlled trials have demonstrated that PTCA improves clinical outcomes in many patients who commonly receive the procedure, and previous studies have documented large differences across hospitals in the likelihood of treatment with PTCA after MI and in other clinical settings.

Precision

Because adult admissions for PTCA are relatively common, it should be possible to generate precise estimates of utilization at the area level. However, random variation in utilization rates may become more problematic for relatively small areas (e.g., zip codes) or underpopulated areas (e.g., rural counties).

Minimum bias

Utilization rates standardized at the area level (e.g., adult population of a county or SMSA) may be biased by differences in the prevalence of coronary artery disease. The prevalence of CAD may, in turn, be related to the age structure of the population and the prevalence of behavioral or physiologic risk factors such as smoking and hyperlipidemia. Many previous studies have also shown that sociodemographic differences affect PTCA utilization, e.g., black patients are significantly less likely to undergo PTCA than non-black patients, particularly when indications were deemed equivocal. Little evidence exists on the extent to which area differences in socioeconomic and clinical characteristics may explain area differences in PTCA rates, though the large variations in rates across small geographic areas suggest that population characteristics are unlikely to explain most of the differences. 487 A second potential source of bias is the performance of procedures on an outpatient basis, which would not be captured in HCUP data and thus would not be reflected in HCUP-based area rate measures. However, less than 10% of PTCAs were performed on an outpatient basis in recent years, and so outpatient procedures do not appear to be quantitatively important enough to explain much of the observed variation. 297

Construct validity

For this indicator to perform well in identifying true quality of care problems, there must be evidence of significant inappropriate utilization in population-based studies. In addition, there must be substantial variation in the rate of inappropriate utilization across providers or small areas. Previous studies have classified possible indications for PTCA as appropriate, uncertain, or inappropriate, as a method for assessing quality of care.

In a study of seven of eight Swedish public heart centers, 38.3% of all PTCA procedures were performed for inappropriate indications and 30% for uncertain indications. 478 Another study of cardiac catheterization labs in Maryland used three different sets of criteria (RAND, ACC/AHA, and RAS) to assess the appropriateness of PTCA in these labs. Inappropriate indications ranged from 22 to 49%. The rate of uncertain indications was 29%, according to RAND criteria, the only set that includes this rating. 487 In a follow-up study of a coronary angiography study conducted in New York, a panel of Duke University cardiologists reviewed 308 records for appropriateness. Rates of PTCA performed for both inappropriate and uncertain reasons increased. That for in inappropriate indications was raised to 12%, while the rate of procedures performed for uncertain indications became 27%. 476

Fosters true quality improvement

In an effort to raise public perception of its quality without actually doing so, service providers might attempt to game results, or take action that would significantly decrease the incidence of an indicator or increase provider risk for these cases. This leads to an inaccurate assessment of quality by the public. Ideally, gaming is avoided in practice, present only in theory. However, providers might engage in practices, such as miscoding cases or recruiting patient groups that are known to have increased risk of CAD, in order to achieve more favorable quality assessment results with regard to PTCA. Perhaps instead of serving as quality assessments, results of these appropriateness studies can serve as guidelines for difficult clinical decisions. Patients and their physician might use them to spark questions and discussion about CAD, the patient's specific indications, and the most appropriate treatment, one that poses the least risk to the patient. 488 Implementing the results of these studies on an individual level might relieve undue stress of providers and reduce the temptation to inaccurately depict provider quality.

Prior use

The area-based rate of PTCA is a current indicator in the Dartmouth Atlas. 479

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation190.8, 455.6 
    Systematic area-level standard deviation*0.28%Very High
    Area variation as a percentage of total variation*0.21%Very High
    Signal ratio*97.3%Very High
    R-Square*97.3%Very High
    * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
     Average absolute change (in %)10.4%Good
    Relative impact:
      Rank correlation0.671Fair
      Percent remaining in high decile/low decile36.4% / 95.5%Fair/ V.G.
      Percent changing more than 2 deciles35.5%Fair
Precision

This indicator is precise, with a raw area level mean of 190.8 per 100,000 population and a standard deviation of 455.6. The systematic area level standard deviation is very high, at 0.28%. The area level variation also accounts for a very high percentage of total variation, at 0.21%. This means that relative to other indicators, a higher percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is also very high, at 97.3%. This means that it is very likely that the observed differences in area performance represent true differences in area performance. The very high R-square represents the high proportion of signal that can be extracted using multivariate techniques. However, multivariate methods have no additional impact, due to the already very high signal ratio.

Bias

Signal variance does not change substantially with risk adjustment. The indicator performs poorly on most measures of minimum bias. Risk adjustment impacts the absolute performance of areas moderately. In addition, 35.5% of areas changed more than 2 deciles. Risk adjustment appears to impact the highest decile disproportionately to the lowest decile, with 36.4 percent of areas in the highest decile remaining after risk adjustment, compared to 95.5% in the lowest decile. The overall relative impact of risk adjustment was also substantial, with a low rank correlation of .671.

Construct validity

PTCA rate loads on factor 1, and is related to all other area utilization indicators.

Discussion

The appropriateness and potential overuse of PTCA has been discussed in the literature. PTCA rates have been shown to vary widely and systematically between areas. Sociodemographic differences do not explain all of this variation, though inappropriate indications do not account fully for the variance either. Patient and physician preference may also play a role in the variation of PTCA rates by area.

Given the frequency that this procedure is performed, it is expected to be precise. However, some hospitals may perform very few procedures; precision may be a particular problem for these hospitals. Our empirical analysis found the precision to be very high, with very high systematic variation. The very high signal ratio suggests that any observed differences are likely to reflect true differences in performance. Multivariate smoothing techniques do not appear to have additional impact on the ability to extract signal for this indicator, primarily due to the already high signal ratio. Univariate smoothing, as always, is recommended.

In our empirical analysis of minimum bias, risk adjustment by age and sex does affect the performance of providers. In addition, risk adjustment appears to affect areas with the highest rates substantially. This means that without adequate risk adjustment areas may be mislabeled as outliers when in fact they are not. Some clinical factors were noted in our literature review are appropriate indications for PTCA. These factors may be related to the prevalence of coronary artery disease and as such may be more prevalent in areas with an older age structure or higher rates of smoking or hyperlipidemia. It is unlikely that the area variance resulting from these factors would be substantial enough to account for all the observed variance, and thus we would not expect this indicator to substantially biased. Risk adjustment by age and sex is recommended. A significant, but small about of PTCA procedures are performed on an outpatient basis (<10%), potentially biasing this indicator. If available, areas may wish to examine both inpatient and outpatient rates together.

The ideal rate for PTCA has not been established and indeed there are cases where PTCA is an appropriate and necessary procedure. However, several studies have noted relatively high rates of inappropriate procedures (up to 48%). Other studies report very low rates of inappropriate use, leaving the question of the validity of this indicator open for interpretation. Methods of evaluating the appropriateness of procedures have been established in the literature cited in this report and by other sources. These methods could be used to evaluate the appropriateness of procedures within an area.

Area rates are based on the rates for hospitals within an area, and as such do not take into account that some patients are referred into area hospitals from a different area. Examination of data containing patient residence may aid in identifying the extent to which patients are referred into an area. HCUP data may also be used to examine which hospitals contribute the most to the overall area rate.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended only for use with measures of mortality and volume. It received an empirical rating of 19 out of 26. This indicator is recommended with several major caveats of use. As an area utilization indicator, this indicator is a proxy for actual quality problems. This indicator has unclear construct validity, as high utilization of PTCA has not been shown to necessarily be associated with higher rates of inappropriate utilization. A minor source of bias may be the small amount of procedures performed on an outpatient basis. Finally, caution should be maintained for PTCA rates that are drastically below or above the average or recommended rates.

3.E.4. Ambulatory Care Sensitive Condition Measures

The literature review of the evidence related to potentially avoidable hospital admissions is limited for each indicator because many of the indicators have been developed as parts of sets. Therefore, prior to relating evidence on specific indicators, we introduce the strategy for applying the evaluation framework to potentially avoidable hospital admissions, and provide general information applicable to all indicators in this category.

Only five studies276,284,489-491 have attempted to validate individual indicators rather than whole measure sets. Hence, a major limitation of this literature is that we know relatively little about which components represent the strongest measures of access and quality. Most of the 5 papers that did report about individual indicators also used a single variable, such as median area-specific income or rural residence, for construct validation. All but one of these papers (Bindman) included adjustment only for demographic factors (e.g., age, sex, race).

Bindman et al. 284 provides the best evidence of validity for individual indicators, by demonstrating strong independent associations between self-rated access to care and hospitalization rates for asthma, CHF, and diabetes (after adjusting for sociodemographic factors, severity of illness, and propensity to admit). In addition, associations for COPD and hypertension were weaker but also statistically significant.

The other four studies have validated a similar wider set of indicators. Billings et al. 489 validated their version of all 5 of these indicators, except hypertension, as well as epilepsy and convulsions, severe ENT infections, bacterial pneumonia, angina, cellulitis, gastroenteritis, kidney/UTI, dehydration, iron deficiency anemia, and PID (with the weakest associations involving COPD and gastroenteritis). Millman and the Institute of Medicine 491 validated the same set of indicators as Billings et al., with the exception of iron deficiency anemia and PID, but with the addition of hypertension and hypoglycemia. Of the validated indicators, the weakest association again involved gastroenteritis. Weissman et al. 276 found the strongest evidence of validity for cellulitis, diabetes, gangrene, malignant hypertension, and pneumonia; with mixed evidence for asthma, CHF, hypokalemia, immunizable conditions, pyelonephritis, and bleeding ulcer. Only one of Weissman's indicators, ruptured appendix, was not validated, though a modified version of this indicator was validated in two of the other studies mentioned above.

Twenty-nine other studies have examined sets of ACSC conditions, without examining indicators independently. The indicators examined by these studies, and details regarding study design are summarized in Appendix 8. Since a majority of the evidence for ACSC indicators comes from studies of sets of ACSC indicators together, this report considers this evidence in addition to the limited evidence for each indicator. The evidence regarding ACSC indicators in general is reported below.

Face validity.

The following questions are addressed separately for each indicator:

  1. Have clinical trials demonstrated that specific outpatient therapies can reduce the risk of hospitalization?
  2. Have observational studies shown associations between specific outpatient therapies and the risk of hospitalization?
  3. Is there general consensus that hospitalizations for this condition are often avoidable or preventable, if the patient has timely access to high-quality outpatient care?

Precision. The precision of avoidable hospitalization rates is likely to depend on the size of the denominator. For example, Bindman et al. compared the correlation of preventable hospitalization rates for 5 adult conditions (asthma, COPD, CHF, diabetes, hypertension) between consecutive years, at the level of contiguous zip code clusters, or medical service study areas. The correlation across 250 urban clusters, with a median population of 52,000, was 0.96. By contrast, the correlation across 144 rural clusters, with a median population of 16,000, was only 0.81. 492 Across about 160 zip codes in New York City, aggregate ambulatory-care sensitive (ACS) admission rates in 1993 and 1982 were highly correlated (r=0.92). 283 Finally, a recent abstract from the United Kingdom reported (without supporting data) that "Ambulatory Sensitive Condition" admission rates for Health Authorities, which have a mean population of about 250,000, are "quite stable between subsequent years, as expressed in terms of the Spearman's rank correlation". 493

Minimum bias. Previous studies have documented several characteristics that are associated with either the risk of an avoidable hospitalization (at the individual level) or the avoidable hospitalization rate (at the area level). These factors are potential confounders, or sources of bias, when avoidable hospitalization rates are used as a measure of access to care. Bindman et al. found that condition prevalence was an especially important correlate of hospitalization rates for 5 chronic conditions at the zip code cluster level. 284 Race and measures of socioeconomic status (percent with no college, percent with income <$15,000) were also independently associated with "preventable hospitalization rates," but these factors might be measuring subtle aspects of access to care. Interestingly, propensity to seek care and physician practice style explained very little of the variability in area hospitalization rates. At the individual level, self-reported health status, functional limitations, several chronic diseases (e.g., coronary artery disease, diabetes), and a chronic disease risk score are associated with preventable hospitalizations among Medicare beneficiaries. 280 Adding these factors to a predictive model substantially reduces the effect of income (e.g., adjusted Odds Ratio for income <$9,517 decreases from 1.96 to 1.38). 282

Propensity to seek care and physician practice style have been evaluated indirectly; Weissman et al. found that mean comorbidity (DRGSCALE) scores based on hospital discharge abstracts differed by insurance status. Lower mean comorbidity scores among uninsured patients for about half of the avoidable hospital conditions might indicate that physicians treating this population apply a lower admission threshold. 276 Clearer evidence of this possible practice pattern comes from a study that demonstrated striking differences in the severity-of-illness of patients hospitalized with diabetes across counties with low, medium, or high hospitalization rates. 494 Hospitals in low-rate counties had a higher proportion of severe admissions than hospitals in high-rate counties (38-46% versus 20%). A survey of patients hospitalized in Nebraska market areas with high (versus low) ACS admission rates suggests that patient delays in seeking outpatient care, rather than poor access, may be the primary cause of this difference. 495

Construct validity. Most previous studies have assessed the validity of an entire set of avoidable hospital conditions, rather than each condition alone, and have used socioeconomic status as a marker of access to care. These studies have repeatedly shown strong correlations between household income and avoidable hospitalizations, both at the individual level and the area level. At the zip code level, income alone explains 51-84% of the variability in ACS admission rates across 15 metropolitan areas in the US. 283 This association is substantially weaker among persons 65 or more years of age,275,281 as one would expect if it is driven by access to care rather than underlying social factors. Avoidable hospitalization rates are higher among uninsured or Medicaid-enrolled persons than among privately insured persons, even after adjustment for race and income. 276 Maternal education was the dominant socioeconomic correlate of "discretionary" hospitalization rates among infants (<2 years), accounting for 89% of total variability across 37 zip code areas. 287

Fewer studies have tested true measures of access to care. In the best of these studies, Bindman and colleagues 284 showed that a 5-point scale of self-reported "difficulty in receiving medical care when needed" explained 50% of the variability in hospitalization rates for 5 chronic medical conditions (asthma, CHF, COPD, diabetes, and hypertension) across 41 urban zip code clusters in California. Adjustment for condition prevalence, propensity to seek care, physician admitting style, and ecological measures of income, education, insurance, race, and gender, had little effect on the association. By condition, the univariate R-squared was 0.47 for asthma, 0.50 for CHF, 0.27 for COPD, 0.46 for diabetes, and 0.22 for hypertension. Having a regular source of care, and primary care physician/population ratios, were also independently associated with avoidable hospitalization rates, when substituted for self-reported access. 496 These relationships did not hold in two separate studies of rural zip codes, suggesting that avoidable hospitalization rates are invalid indicators of access in rural areas.279,492

In other studies, the physician/population ratio for family and general physicians has been more strongly associated with avoidable hospitalization rates than measures that include internists, pediatricians, or all physicians.286,497 In studies of Medicaid populations, provider continuity in ambulatory care 498 and usual care received from a community health center 499 were associated with lower avoidable hospitalization rates. However, having a regular source of care (for more than 50% of physician office visits) was not associated with lower avoidable hospitalization rates. 500 Not having a primary care physician was significantly associated with ACS hospitalization (versus non-ACS hospitalization) in South Carolina, after adjustment for socioeconomic factors. 501

Several studies of Medicare beneficiaries have shown weak and inconsistent associations between access indicators and avoidable hospitalization rates. For example, persons in the Medicare Current Beneficiary Survey who reported problems obtaining health care, or lived in a health professional shortage area, were not at increased risk of preventable hospitalization. 280 Instead, their risk was heavily influenced by "need characteristics" (clinical factors). However, beneficiaries in fair or poor health reportedly were at increased risk if they lived in a primary care shortage area. 285 An area-level analysis based on Medicare claims suggests that the association between ACS admission rates and physician/population ratios is limited to the 10% of health care service areas with the most severe shortage of physicians (e.g., <0.628 physicians/1,000 population). 502

Fosters real quality improvement. There is limited evidence regarding whether avoidable hospitalization rates can be decreased through interventions to improve access to or quality of care. Three studies have reported on the impact of recent changes in Medicaid eligibility criteria and program design. Kaestner et al. found no narrowing of the differences in "discretionary" infant (<2 year) hospitalization rates between low, middle, and high-income zip codes, during a period of substantial Medicaid eligibility expansion (1988-1992). 503 Implementation of a Medicaid managed care gatekeeper system in Maryland,(Ref. 81) with fee-for-service reimbursement of designated primary medical providers and 24-hour access, modestly reduced the risk of any hospitalization (OR=0.89) but did not decrease the risk of an "ambulatory care sensitive" hospitalization (OR=0.96). Similar programs in Florida and New Mexico may have ledto "significant, but small, reductions" in ACS hospitalization rates for children, although no specific data have been reported. 277 In a cross-sectional study on the impact of physician economic incentives, a capitated group practice achieved far lower ACS hospitalization rates than physicians who participated in 3 independent practice associations or treated patients with indemnity insurance(0.8/1,000 versus 2.7/1,000 and 2.9/1,000, respectively). 288 Establishing new community-based outpatient clinics did not decrease preventable hospitalization rates across the Department of Veterans Affairs' primary service areas. 504

Because the optimal hospitalization rate for ambulatory care sensitive conditions has not been defined, providers may decrease their rates by failing to hospitalize patients who would truly benefit from inpatient care, by eliminating or discouraging product lines important to the community, or by hospitalizing marginally appropriate patients with other conditions (to inflate the denominator). Although these concerns cannot be dismissed, there is no published evidence of worse health outcomes in association with reduced hospitalization rates for these conditions. However, such evidence has been presented for external hernia, appendicitis, and uterine fibroids. 505

INDICATOR 17: ACSC: DEHYDRATION ADMISSION RATE

IndicatorArea level admission rate for dehydration.
Relationship to QualityProper outpatient treatment may reduce admissions for dehydration. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.
*

Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for dehydration per 100,000 population.
Outcome of InterestDischarges with ICD-9 principal diagnosis code for hypovolemia (276.5) per 100,000 population (see Appendix 6).

Age less than 65 years.*

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age less than 65.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Dehydration is a serious acute condition that occurs in frail patients and patients with other underlying illnesses, following insufficient attention and support for fluid intake. It is treatable with oral rehydration therapy and/or IV fluids. If left untreated in older adults, serious complications including mortality (over 50%) is very high. 506 Thus, prevention and effective management of dehydration may lead to fewer complications associated with severe dehydration, including hospitalization.

Precision

Dehydration is a somewhat common cause of hospital admission. We found little evidence on the precision of this indicator. One study did note that dehydration accounted for 7.3% of total admissions for ACSCs. 282

Minimum bias

We found no literature on the potential bias of this indicator. It is possible that the age structure of the population may affect admission rates for this condition, as the elderly and very young are more susceptible to dehydration. Socioeconomic factors may also affect admission rates, though we found no specific evidence confirming this hypothesis. We found no evidence on how comorbidities or other risk factors that may vary systematically by area influence hospitalization rates for dehydration. Finally, different thresholds for admission of patients with dehydration, rather than differences in quality of care, may also lead to area rate differences.

Construct validity

We found little literature on admission for dehydration as an ambulatory care sensitive condition indicator. Dehydration was originally included in John Billings' 489 set of indicators developed for the United Hospital Fund of New York. This set was developed by a physician panel. Evidence on sets of ambulatory care sensitive condition indicators are summarized at the beginning of this section, and should be referred to for this indicator.

Two studies of ACSC indicators reported validation work for dehydration independent of measure sets. Millman et al. 491 reported that low-income zip codes had 2.1 times more dehydration hospitalizations per capita than high-income zip codes in 11 states in 1988. Billings et al. 489 found that low-income zip codes in New York City (where at least 60% of households earned less than $15,000 in 1988, based on adjusted 1980 Census data) had 2.0 times more dehydration hospitalizations per capita than high-income zip codes (where less than 17.5% of households earned less than $15,000). Household income explained 42% of the variation in dehydration hospitalization rates at the zip code level.

Fosters true quality improvement

We found no evidence of the impact such a measure would have on quality. As some dehydration can be managed on an outpatient basis, it is possible that a shift to outpatient care may occur. It is unknown whether hospitalizations for dehydration are appropriate, and thus whether the shift to outpatient care would be appropriate.

Prior use

This indicator was included in Billings set of Ambulatory Care Sensitive Conditions, developed in conjunction with the United Hospital Fund of New York. 489

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation139.9, 103.2 
    Systematic area-level standard deviation*0.04%Moderate
    Area variation as a percentage of total variation*0.02%Moderate
    Signal ratio*88.5%High
    R-Square*88.9%High
    * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
      Average absolute change (in %)9.2%Very Good
   Relative impact:
      Rank correlation0.957Very Good
      Percent remaining in high decile/low decile81.8% / 90.9%Good
      Percent changing more than 2 deciles4.6%Very Good
Precision

This indicator is precise, with a raw area level rate of 139.9 per 100,000 population and a standard deviation of 103.2. The systematic area level standard deviation is moderate, at 0.04%. The area level variation also accounts for a moderate percentage of total variation, at 0.02%. The signal ratio is high, at 88.5%. This means that it is likely that the observed differences in area performance represent true differences in area performance. The high R-square represents the relatively high proportion of signal that can be extracted using multivariate techniques, though this is less than for other indicators. Multivariate techniques appear to have little additional impact.

Bias

Signal variance does not change substantially with risk adjustment. The indicator performs well on multiple measures of minimum bias, using age-sex adjustment. The rank correlation is very high at .957. Risk adjustment does not affect lowest decile disproportionately to the highest decile. In addition, few providers change more than 2 deciles with risk adjustment (4.6%). The absolute magnitude of the impact is minimal, with provider performance changing an average 9.2% relative to the mean with risk adjustment.

Construct validity

Dehydration is related to most other ACSC indicators.

Discussion

Dehydration can for the most part be treated in an outpatient setting, but is potentially fatal for elderly, very young children, frail patients, or patients with serious comorbid conditions. Little evidence exists regarding the validity of this indicator, and as such there is little guidance as to potential causes of admissions. Areas may wish to examine the outpatient care for dehydration, to identify potential processes of care that may reduce admission rates.

Admission for dehydration is somewhat common, suggesting that the indicator will be measured with adequate precision. Our empirical results confirmed that this indicator is measured with adequate precision for use as a quality indicator. The high signal ratio suggests that the observed variation is likely to reflect true differences in performance.

This indicator is subject to some minimal bias. Risk adjustment appears to affect the areas with the highest and lowest rates modestly. Age may be a particularly important factor, and should be risk adjusted for. It is unknown how other clinical factors would impact this measure. Some dehydration care takes place in an emergency room setting. As such, considering inpatient and emergency room data together may give a more accurate picture of this indicator.

Dehydration is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended that it be used in conjunction with other ACSC indicators. It received an empirical rating of 14 out of 26. This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. This indicator has unclear construct validity, as this indicator has not been validated except as part of a set of indicators. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 18: ACSC: BACTERIAL PNEUMONIA ADMISSION RATE

IndicatorArea level admission rate for bacterial pneumonia.
Relationship to QualityProper outpatient treatment may reduce admissions for bacterial pneumonia in non-susceptible individuals. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for bacterial pneumonia per 100,000 population.
Outcome of InterestDischarges with ICD-9 principal diagnosis codes for bacterial pneumonia.

Age less than 65 years.*

Exclude patients with sickle cell anemia or HB-S disease disease (see Appendix 6) in any field.
Exclude patients <2 months (8 weeks) of age.
Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age less than 65 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Bacterial pneumonia is a relatively common acute condition, treatable for the most part with antibiotics. If left untreated in a susceptible individual, pneumonia can lead to death (see pneumonia in-hospital mortality indicator). The elderly are particularly susceptible to pneumonia. A vaccine has been developed, which is used primarily in this population. This vaccine has been shown to be 45% effective in preventing hospitalizations in the elderly during peak seasons. 507 A 1995 survey of Americans older than 65 years found that only 35.6% of respondents reported ever receiving the vaccine. The same study found that minority populations reported a lower rate of vaccination (Hispanic, 24%, Black, 20%) as compared to Whites (37%). Overall state rates were not associated with minority rates of vaccination. This study is limited by the use of self-report, and therefore, vaccination may be underreported. 508

Appropriateness of admissions for bacterial pneumonia may account for some variation in admission rates. One study of emergency department triage strategies found a relatively modest impact on hospitalization rates for low risk cases - cases which may have been treatable on an outpatient basis. 509

Precision

Pneumonia is a very common cause of hospital admission, particularly in the elderly, suggesting that relatively precise estimates of area rates should be feasible. However, little evidence exists on the precision or variation in pneumonia admission rates.

Minimum bias

We found no literature on the potential bias of this indicator. As the elderly are more susceptible to pneumonia, rates may be associated with the age structure of the population. Rates may also differ with comorbid diseases and socioeconomic status. Immunosuppressed patients are more likely to both develop and require hospitalization as a result of pneumonia. Some causes of immunosupression may vary systematically by area. However, we found no evidence that comorbidities or other risk factors that may vary systematically by area significantly affect the incidence of hospitalization for pneumonia. Physician thresholds for admitting patients with pneumonia also differ, which may contribute to observed differences in admission rates.

Construct validity

We found little literature on admission for pneumonia as an ambulatory care sensitive condition indicator. Pneumonia was originally included in Weissman's 276 set of avoidable hospitalization indicators. This set was developed by physician panels. Evidence on sets of ambulatory care sensitive condition indicators are summarized at the beginning of this section, and should be referred to for this indicator.

Two studies of ACSC conditions have examined pneumonia independently. Millman et al. 491 reported that low-income zip codes had 5.4 times more pneumonia hospitalizations per capita than high-income zip codes in 11 states in 1988. Billings et al. 489 found that low-income zip codes in New York City (where at least 60% of households earned less than $15,000 in 1988, based on adjusted 1980 Census data) had 5.4 times more pneumonia hospitalizations per capita than high-income zip codes (where less than 17.5% of households earned less than $15,000). Household income explained 53% of the variation in pneumonia hospitalization rates at the zip code level.

Fosters true quality improvement

We found no evidence of the impact such a measure would have on quality. As some cases of pneumonia can be managed on an outpatient basis, it is possible that use of this indicator may encourage a shift to outpatient care without actually affecting patient outcomes. Such a shift might be inappropriate for more severely ill patients.

Prior use

This indicator was included in Weissman's set of avoidable hospitalizations. 276 Immunization preventable pneumonia was included in the HCUP I indicator set.

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation395.6, 208.5 
    Systematic area-level standard deviation*0.09%High
    Area variation as a percentage of total variation*0.03%High
    Signal ratio*92.9%Very High
    R-Square*93.1%Very High
    * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
      Average absolute change (in %)11.1%Good
    Relative impact:
      Rank correlation0.923Good
      Percent remaining in high decile/low decile68.2% / 90.9%Good
      Percent changing more than 2 deciles9.7%Good
Precision

This indicator is precise, with a raw area level rate of 395.6 per 100,000 population and a standard deviation of 208.5. The systematic area level standard deviation is high, at 0.09%. The area level variation also accounts for a high percentage of total variation, at 0.03%. The signal ratio is very high, at 92.9%. This means that it is likely that the observed differences in area performance represent true differences in area performance. The very high R-square reflects the large proportion of signal that is extractable using multivariate methods. However, multivariate techniques have little additional impact, due primarily to the already large signal ratio.

Bias

Signal variance does not change significantly with risk adjustment. The indicator performs well on multiple measures of minimum bias. The rank correlation is good at 0.923. Risk adjustment appears to affect the highest decile disproportionately to the lowest decile, as 68.2% of areas in the highest decile and 90.9% of the lowest decile remain after risk adjustment. However, this is consistent with other indicators. The indicator still performs well relative to other indicators on this measure. The absolute magnitude of the impact is moderate, as is the relative impact.

Construct validity

Bacterial pneumonia is related to most other ACSC indicators.

Discussion

Bacterial pneumonia can for the most part be treated using available antibiotics, however, low quality treatment may increase the admission rate for this condition. The elderly population is particularly susceptible to pneumonia, and in this population a vaccine is suggested to prevent pneumonia. Little evidence exists regarding the validity of this indicator, and as such there is little guidance as to potential causes of admissions. Areas may wish to examine the outpatient care for pneumonia and pneumococcal vaccination rates, to identify potential processes of care that may reduce admission rates.

Appropriateness of admissions appears to be a particular problem for this indicator. High rates may reflect a large number of inappropriate admissions, and/or poor quality outpatient care, among other things. While some view inappropriate admissions as a quality concern, others do not, as it concerns mainly resource overutilization and may not pose a significant additional risk to the patient.

Admission for bacterial pneumonia is relatively common, suggesting that the indicator will be measured with good precision. Our empirical results showed that this indicator is measured with adequate precision for use as a quality indicator. The very high signal ratio suggests that the observed variation is likely to reflect true differences in performance. Multivariate techniques do not appear to have substantial additional impact. Thus, either univariate or multivariate smoothing is recommended.

This indicator is subject to some moderate bias. Relative and absolute performance change somewhat with age-sex risk adjustment. In particular risk adjustment appears to affect the areas with the highest rates the most. Age may be a particularly important factor, and should be risk adjusted for. It is unknown how other clinical factors would impact this measure. In addition, some pneumonia care takes place in an emergency room setting. As such, considering inpatient and emergency room data together may give a more accurate picture of this indicator.

Bacterial pneumonia is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended that it be used in conjunction with other ACSC indicators. It received an empirical rating of 17 out of 26. This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. This indicator has unclear construct validity, as this indicator has not been validated except as part of a set of indicators. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 19: ACSC: URINARY INFECTION ADMISSION RATE

IndicatorArea level admission rate for urinary infection.
Relationship to QualityProper outpatient treatment may reduce admissions for urinary infection. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for urinary infection per 100,000 population.
Outcome of InterestDischarges with ICD-9 principal diagnosis code of urinary tract infection per 100,000 population (see Appendix 6).

Age less than 65 years.*

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age less than 65.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Urinary infection (UTI) is a common acute condition. Uncomplicated urinary tract infections are treatable with antibiotics on an outpatient basis. If left untreated or incompletely treated in a susceptible individual, urinary tract infections can spread into the kidneys (pyelonephritis) or develop into septicemia. Among children, admission for UTI is associated with physiological abnormalities and is rare.

Precision

We found little evidence on the precision of this indicator. One study did note that UTIs and kidney infections accounted for 10.6% of total admissions for ACSCs. 282

Minimum bias

Urinary tract infections are a somewhat common cause of hospitalization. We found no literature on the potential bias of this indicator. We found no evidence that comorbidities or other risk factors that may vary systematically by area may increase the incidence of hospitalization for UTI. Thresholds for admission of patients with urinary tract infections may differ across areas, potentially contributing to observed differences in area hospitalization rates.

Construct validity

We found little literature on admission for urinary infection as an ambulatory care sensitive condition indicator. UTI was originally included in both John Billings' 489 set of indicators developed for the United Hospital Fund of New York, and in Weissman's 276 set of indicators. These sets were developed by physician panels. Evidence on sets of ambulatory care sensitive condition indicators are summarized at the beginning of this section, and should be referred to for this indicator.

Two studies of ACSC indicators reported validation work for UTI independent of measure sets. Millman et al. 491 reported that low-income zip codes had 2.8 times more UTI hospitalizations per capita than high-income zip codes in 11 states in 1988. Billings et al. 489 found that low-income zip codes in New York City (where at least 60% of households earned less than $15,000 in 1988, based on adjusted 1980 Census data) had 2.2 times more UTI hospitalizations per capita than high-income zip codes (where less than 17.5% of households earned less than $15,000). Household income explained 28% of the variation in UTI hospitalization rates at the zip code level.

Fosters true quality improvement

We found no evidence of the impact such a measure would have on quality. As most UTIs can be managed on an outpatient basis, it is possible that a shift to outpatient care may occur. The "appropriate" rate of inpatient treatment for UTI is unknown, and thus there is little evidence on whether the shift to outpatient care would reduce quality of care.

Prior use

This indicator was included in Billings 489 set of Ambulatory Care Sensitive Conditions, developed in conjunction with the United Hospital Fund of New York, and in Weissman's set of avoidable hospitalizations. 276

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation145.1, 89.5 
    Systematic area-level standard deviation*0.04%Moderate
    Area variation as a percentage of total variation*0.01%Moderate
    Signal ratio*84.9%High
    R-Square*85.4%High
    * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
     Average absolute change (in %)11.5%Good
    Relative impact:
     Rank correlation0.914Good
     Percent remaining in high decile/low decile68.2% / 90.9%Good
     Percent changing more than 2 deciles8.8%Good
Precision

This indicator is precise, with a raw area level rate of 145.1 per 100,000 population and a standard deviation of 89.5. The systematic area level standard deviation is moderate, at 0.04%. The area level variation also accounts for a high percentage of total variation, at 0.01%. This means that relative to other indicators a higher percentage of the total variation reflects unobserved patient differences. The signal ratio is high, at 84.9%. This means that it is likely that the observed differences in area performance represent true differences in area performance, though this is lower than other indicators. The high R-square reflects the large proportion of signal variance that can be extracted using multivariate techniques, though lower than some other indicators. Multivariate techniques have little additional impact.

Bias

Signal variance does not change with risk adjustment. The indicator performs well on multiple measures of minimum bias. The rank correlation is good at 0.914. Risk adjustment appears to impact the highest decile disproportionately to the lowest decile, as 68.2% of areas in the highest decile and 90.9% of the lowest decile remain after risk adjustment. The absolute magnitude of the impact is moderate, as is the relative impact.

Construct validity

Urinary infection is related to most other ACSC indicators.

Discussion

Urinary tract infection can for the most part be treated in an outpatient setting, but may progress to more clinically significant infections, such as pyelonephritis, in vulnerable individuals with inadequate treatment. Little evidence exists regarding the validity of this indicator, and as such there is little guidance as to potential causes of admissions. Areas may wish to examine the outpatient care for urinary tract infection, to identify potential processes of care that may reduce admission rates.

Admission for urinary tract infection is uncommon, suggesting that there may be potential problems with precision. Our empirical results demonstrates that this indicator is measured with adequate precision for use as a quality indicator. The high signal ratio suggests that observed variation is likely to reflect true differences in performance. Multivariate techniques do not appear to have substantial additional impact. Therefore, univariate or multivariate smoothing is recommended for this indicator.

This indicator is subject to some moderate bias. Relative and absolute performance change somewhat with age-sex risk adjustment, and so we recommend adjusting for age and sex. Risk adjustment appears to affect the areas with the highest rates the most, though this impact is moderate in comparison to other indicators. Using this indicator without risk-adjustment may result in the misidentification of some areas as outliers. It is unknown how other clinical factors would impact this measure. In addition, some urinary tract infection care takes place in an emergency room setting. As such, considering inpatient and emergency room data together may give a more accurate picture of this indicator.

Urinary tract infection is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended that it be used in conjunction with other ACSC indicators. It received an empirical rating of 11 out of 26. This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. This indicator has unclear construct validity, as this indicator has not been validated except as part of a set of indicators. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 20: ACSC: PERFORATED APPENDIX ADMISSION RATE

IndicatorArea level admission rate for perforated appendix.
Relationship to QualityTimely diagnosis and treatment may reduce the incidence of perforated appendix. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureAdmissions for perforated appendix per 100 admissions for perforated appendix within MSA or county.
Outcome of InterestDischarges with ICD-9 diagnosis code for perforations or abscesses of appendix in any field per 100 discharges with diagnosis code for appendicitis within area (see Appendix 6).

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskDischarges with diagnosis code for appendicitis within MSA or county.
Evidence from the literature
Face validity

Perforated appendix may occur when appropriate treatment for acute appendicitis is delayed for any variety of reasons. Such delay could result from access to care problems, patients failing to interpret symptoms as potentially important, misdiagnosis and other delays in obtaining surgery.

Precision

Perforated appendix is a relatively common condition, occurring in ¼ to 1/3 of hospitalized acute appendicitis patients. 510 Thus, it is likely to be measured with good precision.

Minimum bias

Observational studies utilizing large administrative databases have noted higher rates of perforated appendix in males, 510 patients with mental illness or substance abuse disorders, 510 diabetics, 510 blacks, 510 and children under the age of four (though appendicitis is rare in this age group). 511 If areas have an unusually high proportion of patients with diabetic comorbidity, or patient with substance abuse or psychiatric comorbidity may have higher rates of perforated appendix that may not be due to actual differences in quality of care. However, since appendicitis is relatively rare as compared with these comorbidities, it is unlikely that an area rate would change significantly due to perforated appendix in these populations.

Construct validity

Three recent studies have examined perforated appendix as a measure of access to care. The first examined all California adult admissions (18-64 years of age), with acute appendicitis (96,587 cases). 510 They found that patients with either no insurance or Medicaid had just under 50% greater risk of perforated appendix than HMO covered patients. They interpreted this result to be evidence of potential access to care problems. They also found a 20% increased risk in patients with private fee-for-service insurance. In a follow-up to this study, Blumberg et al. noted that when examining the high rate of perforated appendix in the black population at Kaiser, patients with almost identically comprehensive coverage, black patients did not have higher rates of delayed admissions, a possible indication of poor quality care. They postulated that rather, delay in seeking care may explain some of the differences observed. 512 Thus, it is unclear whether the higher rate of perforated appendix is due to quality of care problems, perceived access to care difficulties, or other reasons for the delay in seeking care.

The second study took a similar approach in analyzing pediatric admissions for acute appendicitis in Washington state. 511 The rate of perforated appendix was again increased in Medicaid patients [Adj. OR 1.3, 95% CI (1.2-1.4)]. Another study in a pediatric population examined reasons for delay to surgery and insurance status in a New York pediatric population through retrospective chart review. They noted that Medicaid or uninsured children had both a higher perforation rate and a longer duration of symptoms before presenting to a health care professional as compared to HMO or private fee for service insured children. There were no differences between the types of insurance in the time to surgery after presentation. 513 Unfortunately the authors did not analyze how much of the variance in perforated appendix could be explained by delays in seeking care.

Weissman et al. their analysis of avoidable hospitalizations found that uninsured had a relative risk of 1.14-1.20 of admission for ruptured appendix after adjusting for age and sex. Medicaid patients had a relative risk of .45-.58, suggesting that in at least this case, Medicaid patients are not at increased risk for ruptured appendix. 276

Fosters true quality improvement

Rates of perforated appendix could be increased by increasing the denominator, total patients admitted for appendicitis, by increasing the number of patients with borderline or questionable cases undergoing appendectomy. 510

Prior use

Perforated appendix was included in both the previous HCUP I indicator set, as well as in Weissman's set of avoidable hospitalizations. 276

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation33.3%, 14.4% 
    Systematic area-level standard deviation*2.75%Very High
    Area variation as a percentage of total variation*0.34%Very High
    Signal ratio*26.5%Low
    R-Square*39.4%Low
    * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
    Signal variance change with risk adjustmentDecreasesFair
    Absolute impact:
      Average absolute change (in %)2.0%Very Good
    Relative impact:
      Rank correlation0.969Very Good
      Percent remaining in high decile/low decile90.0% / 85.0%V.G/ Good
      Percent changing more than 2 deciles2.5%Very Good
Precision

This indicator is precise, with a raw area level rate of 33.3% and a substantial standard deviation of 14.4%. The systematic area level standard deviation is very high, at 2.75%. The area level variation also accounts for a very high percentage of total variation, at 0.34%. This means that relative to other indicators, a higher percentage of the variation occurs at the area level, rather than the discharge level. However, the signal ratio is low, at 26.5%. This means that it is likely that much of the observed differences in area performance do not reflect true differences in performance. The R-Square does demonstrate the improvement in the amount of signal that can be extracted using multivariate techniques. Nonetheless, the R-square is still low at 39.4%.

Bias

Signal variance decreases by over 15% with risk adjustment by age and sex. This indicates that some of the apparent signal is due to systematic differences in patient demographics. The demographic (age-sex) adjustment had minimal impact on most measures of minimum bias. The rank correlation was high at 0.969. The number of areas changing more than two deciles was relatively low, at 2.5%. Relative to other indicators, this risk adjustment had a moderate impact on the lowest end of the distribution, with 85% of providers in the lowest decile remaining after risk adjustment.

Construct validity

Perforated appendix is negatively related to the other ACSC conditions. It is positively related to low birth weight.

Discussion

With prompt and appropriate care, acute appendicitis should not progress to perforation or rupture. However, rates of perforated appendix are higher in the uninsured or underinsured in both the adult and pediatric population. It is unclear whether this arises from patients failing to seek appropriate care, access to care difficulties, or misdiagnoses and poor quality care. Areas with high rates may want to investigate the reasons for delay in receiving surgery in order to target points of intervention. This may be accomplished through mechanisms such as chart reviews and other supplemental data.

In our empirical analyses we observed very high systematic variation in this condition, though the signal ratio was low, suggesting that some of the variance observed does not reflect true differences in provider performance. This indicator is measured with sufficient precision for use as a quality indicator. However, as multivariate smoothing appeared to increase the extractable signal for this indicator, it is highly recommended.

This indicator performed well on our measures of minimum bias, adjusting for age and sex. We found no evidence in our literature review that clinical characteristics that would vary systematically increase the likelihood of perforated appendix. Thus, this is unlikely to be a clinically biased indicator. Perforated appendix rates vary systematically by race, but whether this is due to poor quality care or other factors is not known.

Perforated appendix is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of access to care, outpatient and other health care, and as such are defined with area level denominators. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate. The patient populations served by these hospitals may be a starting point for interventions. Hospital contributions to the overall area rate may be of particular utility for this indicator. As perforated appendix can arise from delay in receiving surgeries, misdiagnoses and other causes of delay in receiving surgery in emergencies rooms may contribute to the rate substantially.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 17 out of 26. Smoothing is recommended for this indicator. This indicator is recommended with one caveat of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems.

INDICATOR 21: ACSC: ANGINA WITHOUT PROCEDURE ADMISSION RATE

IndicatorArea level admission rate for angina (without procedure).
Relationship to QualityProper outpatient treatment may reduce admissions for angina (excluding admission for procedures). As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for angina (without procedures) per 100,000 population.
Outcome of InterestDischarges with ICD-9 principal diagnosis codes for angina (see Appendix 6).

Age 18 years to 64 years old.*

Exclude discharges with a surgical procedure in any field (01.0-86.99).
Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Both stable and unstable angina are symptoms of potential coronary artery disease. Stable angina can be managed in an outpatient setting, using drugs such as aspirin, beta blockers, and advise to change diet and exercise habits. 514 Unstable angina may be managed on an outpatient basis, but admission is required for more severe cases, such as patients with recurrent or progressive symptoms. 515 Effective management of coronary disease reduces the occurrence of major cardiac events such as heart attacks, and may also reduce admission rates for angina.

Precision

Unstable angina is a very common reason for hospital admission. One study noted that angina accounted for 16.3% of total admissions for ACSCs. 282 Thus, reasonably precise estimates of area angina rates should be feasible.

Minimum bias

We found no literature on the potential bias of this indicator. The incidence of angina is related to the incidence of coronary artery disease (CAD), which is in turn related to the age structure and risk factors (smoking, hyperlipidemia, hypertension, diabetes), in a population. Some areas may systematically vary in the incidence of angina, as a result of these factors and potentially correlated differences in socioeconomic status. Elderly age (over 70), diabetes and hypertension have also been associated with higher risk angina. 515 Finally, physicians may differ in their thresholds for admitting patients with less stable angina. Little evidence exists on the extent to which these factors may account for area differences in angina admission rates.

Construct validity

We found little literature on admission for angina as an ambulatory care sensitive condition indicator. Angina was originally included in John Billings' 275 set of indicators developed for the United Hospital Fund of New York. This set was developed by a physician panel. Evidence on sets of ambulatory care sensitive condition indicators are summarized at the beginning of this section, and should be referred to for this indicator.

Two studies of ACSC indicators reported validation work for angina independent of measure sets. Millman et al. 491 reported that low-income zip codes had 2.7 times more angina hospitalizations per capita than high-income zip codes in 11 states in 1988. Billings et al. 489 found that low-income zip codes in New York City (where at least 60% of households earned less than $15,000 in 1988, based on adjusted 1980 Census data) had 2.3 times more angina hospitalizations per capita than high-income zip codes (where less than 17.5% of households earned less than $15,000). Household income explained 13% of the variation in angina hospitalization rates at the zip code level.

Fosters true quality improvement

We found no evidence of the impact such a measure would have on quality. As some angina can be managed on an outpatient basis, it is possible that a shift to outpatient care may occur without true changes in the occurrence of angina complications. However, it seems unlikely that severe angina, requiring observation, would be shifted to outpatient settings.

Prior use

This indicator was included in Billings set of Ambulatory Care Sensitive Conditions, developed in conjunction with the United Hospital Fund of New York. 275

Empirical Evidence
TestStatisticRating
Precision
    Raw area level rate/standard deviation166.0, 135.7 
    Systematic area-level standard deviation*0.06%High
    Area variation as a percentage of total variation*0.04%High
    Signal ratio*91.6%Very High
    R-Square*91.9%Very High
    * age- and gender- adjusted
Minimum Bias - age-sex risk adjustment
    Signal variance change with risk adjustmentNo changeGood
    Absolute impact:
      Average absolute change (in %)10.6%Good
    Relative impact:
      Rank correlation0.968Very Good
      Percent remaining in high decile/low decile63.6% / 90.9%Good
      Percent changing more than 2 deciles3.2%Very Good
Precision

This indicator is precise, with a raw area level rate of 166 per 100,000 population and a standard deviation of 135.7. The systematic area level standard deviation is high, at 0.06%. The area level variation also accounts for a high percentage of total variation, at 0.04%. The signal ratio is very high, at 91.6%. This means that it is likely that the observed differences in area performance represent true differences in area performance. The very high R-square reflects the large proportion of signal that can be extracted using multivariate techniques. Such techniques do not have substantial additional impact, primarily due to the already very high signal ratio.

Bias

Signal variance does not change with risk adjustment. The indicator performs well on multiple measures of minimum bias. The rank correlation is very good at 0.968. Risk adjustment appears to disproportionately impact the high decile compared to the low decile, as 63.6% of areas in the highest decile and 90.9% of the lowest decile remain after risk adjustment. Few providers change more than two deciles. The absolute magnitude of the impact is moderate, with an average change with risk adjustment (relative to the mean) of 10.6%.

Construct validity

Angina is related to most other ACSC indicators.

Discussion

Angina without procedure is a common reason for admission. Some angina admissions may be avoidable through proper outpatient care, aimed at treating the effects of coronary artery disease. Little evidence exists regarding the validity of this indicator, and as such there is little guidance as to potential causes of admissions. Areas may wish to examine the outpatient care for angina, as well as emergency room care, to identify potential processes of care that may reduce admission rates.

Admission for angina is relatively common, suggesting that the indicator will be measured with good precision. Our empirical results showed that this indicator is measured with adequate precision for use as a quality indicator. The very high signal ratio suggests that the observed variation is likely to reflect true differences in performance. Multivariate techniques do not have substantial additional impact, and as such either univariate or multivariate techniques is recommended.

Risk adjustment does not impact this measure substantially, though there is some moderate impact. Age and sex may be particularly important factors, and should be risk adjusted for. It is unknown how other clinical factors would impact this measure. Some angina care takes place in an emergency room setting. As such, considering inpatient and emergency room data together may give a more accurate picture of this indicator.

Angina is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended that it be used in conjunction with other ACSC indicators. It received an empirical rating of 19 out of 26. This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. This indicator has unclear construct validity, as this indicator has not been validated except as part of a set of indicators. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 22: ACSC: ADULT ASTHMA ADMISSION RATE

IndicatorArea level admission rate for adult asthma.
Relationship to QualityAppropriate and continued treatment may reduce the incidence of exacerbation of asthma requiring hospitalization. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for adult asthma per 100,000 population.
Outcome of InterestDischarges with ICD-9 principal diagnosis codes for asthma (see Appendix 6).

Age 18-64 years.*

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years and older.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Asthma is one of the most common chronic diseases in the United States, affecting nearly 15 million individuals. 516 In 1997, there were about 433,000 517 to 484,000 hospitalizations for asthma in the US, 518 of which at least 252,000 involved persons 18 or more years of age. In 1996, asthma was the 10th most common principal diagnosis in emergency department (ED) visits and the ninth most common diagnosis in hospital outpatient departments. 519 In this discussion, we will consider only admission for asthma in adults. A separate discussion of the evidence for the pediatric asthma indicator (indicator 31) is included elsewhere in this report.

There is widespread consensus, embodied in the National Asthma Education Program, 520 that asthma is a readily treatable chronic disease that can be managed effectively in the outpatient setting. As Healthy People 2010 indicates, "effective management of asthma includes four components: avoiding or controlling the factors that may make asthma worse..., taking appropriate medications tailored to the severity of the disease, objective monitoring of the disease by the patient and the health care professional, and actively involving the patient in managing the disease."

Many asthma exacerbations are preventable using inhaled anti-inflammatories such as corticosteriods and mast cell stabilizers, or treatable in the outpatient setting using beta agonists and systemic corticosteroids. Observational studies offer limited evidence that inhaled steriods may decrease risk of admission by up to 50%.521, 522 Potential confounding factors such as asthma severity, however, limit the conclusiveness of these results. In addition, factors that explain variation in the risk of hospitalization at the patient level may not explain variation in area-level hospitalization rates.

Precision

Asthma is a common cause of admission for adults, and as such this measure is likely to have adequate precision. Blustein et al. noted that asthma accounted for 5% of ACSC admissions for Medicare beneficiaries. 282 One United Kingdom study noted that 51% of health authorities changed over 10 or more places when ranked by asthma admission rates over a 1 year period (1993-1994). 522 The generalizability of this study to the United States is unknown.

Minimum bias

Numerous environmental risk factors for asthma have been identified, and some of these factors are more prevalent in certain communities than in others. Indoor allergens such as cockroaches and dust mites may be more common in lower-income areas, and are probably associated with increased frequency and severity of asthma symptoms. 523 Tobacco smoke is the most important indoor irritant and is a major precipitant of asthma symptoms in both children and adults.524-527 Jindal and colleagues 526 found that exposure of adults to environmental tobacco smoke is associated with decreased pulmonary function, increased medication requirements, and more frequent absences from work. Outdoor air pollution, especially respirable particulates, may also play a role.520, 528-532 In addition, ozone and SO2 have been associated with increased emergency department visits and hospitalizations rates.531-538 Increasing air pollution has been specifically correlated with higher admission rates in London (Ozone, NO2, SO2, and black smoke), 539 and Seattle (ambient air pollution) 540 .

Occupation is a potentially important source of bias in comparing asthma hospitalization rates among adults. A recent review of 43 studies from 19 different countries found that at least 9%, and perhaps as much as 15% (based on the highest quality studies), of the population burden of asthma is attributable to occupational factors. 541 About 250 agents capable of causing occupational asthma have been identified. 542 These agents affect individuals in specific occupations, such as carpenters and woodworkers, 543 who may cluster in certain geographic areas.

Race represents one of the most complex potentially biasing factors for this indicator. Black patients have consistently been shown to have higher asthma admission rates,544-546 even when stratifying for income and age. 547 One study examining differences in asthma health care utilization noted that African Americans made fewer asthma-related primary care and specialist visits than Caucasian patients (47.6% vs. 70.2% and 27% vs. 38.8%). There were no differences in hospitalization rates by race, but African-American patients had lower household incomes, and made more emergency department visits (proxy for either access to care or asthma severity). 548 Similarly, Hispanics have been shown to have higher admission rates than non-Hispanic whites (or areas with higher percentages of Hispanics have been shown to have higher admission rates), although none of thesestudies controls for SES.545, 549, 550 To the extent that true differences in disease prevalence or severity are responsible for racial variation in hospitalization rates, race should be adjusted for in comparing asthma hospitalization rates across areas. On the other hand, to the extent that minority patients have less access to care or poorer quality of outpatient care, race should not be adjusted for.

Construct validity

Little evidence has been reported conclusively attaching poor quality of care to higher area admission rates. However, numerous studies have shown that asthma hospitalization rates are associated with socioeconomic factors, including median household income (at the area level) and lack of insurance (at the individual level). A study of asthma hospitalization rates in California in 1993 (ages 0-64) found that areas with median household incomes under $35,000 had hospitalization rates that were 1.5 times higher than areas with higher median incomes. 547 In Boston, in 1992, age and gender standardized hospitalization rates (all ages) were correlated with percentage poverty in an area (r=0.68), percentage holding a bachelor's degree (r=-0.61), and income (r=-0.51). 550 Within New York City in 1994, asthma hospitalization rates were negatively correlated with a zip code area's median household income (r=-0.67), and positively correlated with the percentage of minorities in the population (r=0.82). 549 These findings confirm an earlier study by Billings et al., 489 who reported 6.4-fold variation in asthma hospitalization rates at the zip code level in New York City in 1988, with 70% of this variation explainable by the percentage of households with annual income below $15,000. Millman et al. 491 reported that low-income zip codes had 5.8 times more asthma hospitalizations per capita than high-income zip codes in 11 states in 1988. Using New York State data, Lin et al showed that hospitalization rates were higher in areas with higher poverty, unemployment, minority populations, and lower education levels. 545 Even in England, 45% of the variation in asthma hospitalization rates across 90 family health services authorities in 1990-95 was attributable to socioeconomic factors, plus the availability of secondary care. 551 To our knowledge, only one study has reported partial correlations; 552 it found that that in New York City, the percentage of African-American residents was the strongest predictor, and median household income was the next strongest predictor, of asthma hospitalization rates.

Similar findings have been demonstrated at the individual level. For example, Weissman et al. 276 noted that adjusted relative rate of adult asthma hospitalization for uninsured persons, compared with privately insured persons, was 1.42 (95% CI, 1.20-1.63) in Massachusetts and 1.19 in Maryland (95% CI, 0.92-1.45). The comparable rates for Medicaid beneficiaries were 1.84 (95% CI, 1.63-2.05) and 1.61 (95% CI, 1.33-1.90), respectively. Bierman et al. 553 noted that the asthma hospitalization rate among persons with private insurance nationwide was 33% lower, and that among Medicaid beneficiaries was significantly higher, than that in the general population.

The observation that asthma admission rates are higher in areas with low SES has led some researchers to hypothesize that lack of access to care, or poor quality outpatient care, may lead to higher admission rates. Bindman et al. 284 showed that asthma hospitalization rates across 41 sampled areas in California were significantly correlated (r=0.47) with self-rated access to needed medical care, according to community telephone surveys. Although analyses of the National Health and Nutrition Examination Survey found that Medicaid enrollment and Spanish language preference were associated with inadequate asthma therapy, these deficiencies in care were not directly linked to hospitalizations. 554 Studies from other settings have shown that African-American asthmatics tend to have fewer scheduled primary care visits, and more hospitalizations and emergency room visits, than White asthmatics.555, 556 African-Americans' use of asthma medications may also be less consistent with current practice guidelines. 557

Some weak evidence indicates that patients admitted to the hospital often receive suboptimal outpatient care, as measured by guideline adherence. The National Asthma Education Program (NAEP) Guidelines suggest that patients receive proper asthma education, including use of an MDI and written action plans in case of exacerbation. 520 One study of patients hospitalized in an inner-city teaching hospital noted that only 28% had received an action plan, and 11% could not demonstrate proper use of an MDI, despite reporting having been shown previously by health care personnel. In addition, 69 of the 101 patients reported were prescribed theophylline without first being prescribed anti-inflammatory inhalers. Finally, 60% of the patients who contacted physicians during the current exacerbation reported that the health care provider made no changes in treatment. 558 Similar results, with 95% of patients failing to use an action plan and 51% having inadequate knowledge, were reported from an Australian hospital. 559 Neither of these studies had a control group with non-hospitalized asthmatics, and both relied on self-reported treatment data.

Few studies have directly linked high-quality processes of outpatient care with lower hospitalization rates at either the area or the individual level. An in-depth study of asthma treatment practices in New Haven, Boston, and Rochester found that the community with the highest asthma hospitalization rate (Boston) also had lower use of inhaled anti-inflammatory agents and oral steroids. The threshold for admission also appeared to be lower in Boston, as fewer of the admitted patients were hypoxemic, relative to the other cities. 560 One case control study from a large health maintenance organization established that not having a written asthma management plan was a strong risk factor for asthma hospitalization (after adjusting for severity of asthma), but the use of antiinflammatory medications was not. 561 Although these studies focused on children rather than adults, the results provide limited support for theconstruct validity of the asthma hospitalization rate as an indicator of access to high-quality outpatient care.

Fosters true quality improvement

We located no studies discussing the ability of this indicator to foster true quality improvement. One study from the United Kingdom 551 argues that area admission rates for asthma are not a good measure of quality, due to the confounding factors discussed above. While we located no studies examining the potential for gaming for this indicator, it is possible that patients who present to outpatient clinics or emergency rooms as candidates for admission would not be admitted, but rather treated in the outpatient or ER setting. There is little evidence currently to suggest that asthmatics are being inappropriately denied admission to the hospital, although this problem could emerge in the future.

Prior use

Most published indicators of asthma admission include pediatric as well as adult patients in a single indicator, though they are separate indicators in this report. Children under the age of 5 are sometimes excluded. Admission for asthma has been implicated as an ambulatory care sensitive condition, included in both the preventable hospitalization set of Weissman et al. 276 and the set of ACS conditions developed by Billings et al., 489 in conjunction with the United Hospital Fund of New York. The UK National Health Service has designated asthma admission as a High Level Performance Indicator. In addition, Healthy People 2010 has set a goal to reduce the nationwide asthma admission rate from 12.5 (in 1998) to 7.7 per 10,000 children and adults aged 5-65 years, and from 17.7 to 11 per 10,000 adults over 65 years of age. 5

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation107.9, 81.7 
   Systematic area-level standard deviation*0.04%Moderate
   Area variation as a percentage of total variation*0.02%Moderate
   Signal ratio*83.6%High
   R-Square*84.2%High
   * age- and gender- adjusted  
Minimum Bias - age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)3.8%Very Good
   Relative impact:
     Rank correlation0.989Very Good
     Percent remaining in high decile/low decile86.4% / 95.5%Very Good
     Percent changing more than 2 deciles0%Very Good
Precision

This indicator is adequately precise, with a raw area level rate of 107.9 per 100,000 population and a standard deviation of 81.7. The systematic area level standard deviation is moderate, at 0.04%. The area level variation also accounts for a moderate percentage of total variation, at 0.02%. These suggests that relative to other indicators a lower proportion of variance occurs at the area level, rather than the discharge level. The signal ratio is high, at 83.6%. This means that it is likely that the observed differences in area performance represent true differences in area performance, though some reflects unsystematic differences. The high R-square reflects the large proportion of signal that can be extracted using multivariate techniques, though lower than other indicators. Such techniques do not have substantial additional impact.

Bias

Signal variance does not change with risk adjustment. The indicator performs well on multiple measures of minimum bias. The rank correlation is very good at 0.989. Risk adjustment does not appear to impact the extremes of the distribution substantially. No providers change more than two relative deciles. The absolute magnitude of the impact is minimal, with an average change with risk adjustment (relative to the mean) of 3.8%.

Construct validity

Adult asthma is related to most other ACSC indicators.

Discussion

Asthma is one of the most common reasons for hospital admission and emergency room care. Nonetheless, it is widely accepted that most cases of asthma can be managed with proper ongoing therapy on an outpatient basis. Our literature found some evidence that proper use of inhaled steriods may decrease asthma exacerbation, though this evidence is weak. Many studies have associated increased asthma hospitalization rates with lower socioeconomic status, though in many of these studies confounding factors were not controlled for. Surveys for patients admitted for asthma in low income areas have found inadequate outpatient care. One well designed study noted that 70% of the variance in asthma admission rates is explainable by area self-rated access to care.

Given the large number of asthma hospitalizations, we would expect that this indicator would be adequately precise. Our empirical results confirmed that this indicator is measured with adequate precision for use as a quality indicator, though less than some other indicators. The high signal ratio suggests that most of the observed variance reflects actual differences in performance. Multivariate techniques do not have substantial additional impact, and as such either univariate or multivariate smoothing is recommended.

Our empirical tests demonstrate that this indicator is not subject to substantial bias. Risk adjustment by age and sex did not have substantial impact on the performance of areas. Nonetheless, some factors other than age and sex, may vary systematically by area and may also impact the hospitalization rate for asthma. Environmental factors, such as air pollution, occupational exposure to irritants, or other exposure to allergens (i.e. cockroach, dust mite), have been shown to increase hospitalization rates or exacerbate asthma symptoms. Areas with high rates may wish to examine these factors relative to other areas when interpreting performance. While race has been shown to be associated with differences in admission rates, it is unclear whether this is due to differences in severity of disease or inadequate access to care.

Asthma is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

Most published studies and the Healthy People 2010 indicator combine admission rates for children and adults, though in this report they remain separate indicators. Thus, areas may wish to examine this indicator together with the pediatric asthma indicator (indicator 31).

This indicator is recommended for inclusion in the HCUP II QI set. It received an empirical score of 16 out of 26. It is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 23: ACSC: CHRONIC OBSTRUCTIVE PULMONARY DISEASE (COPD) ADMISSION RATE

IndicatorArea level admission rate for COPD.
Relationship to QualityProper outpatient treatment may reduce admissions for COPD. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for COPD per 100,000 population.
Outcome of InterestDischarges with ICD-9-CM principal diagnosis code for COPD in any diagnosis field (see Appendix 6)

Age 18-64 years.*

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature

Chronic obstructive pulmonary disease (COPD) consists of three primary diseases, asthma, emphysema, and chronic bronchitis. Though each disease causes respiratory dysfunction, each has somewhat distinct etiologies, treatments, and outcomes. Since admission for asthma is considered in a separate indicator, asthma will not be discussed in this section. Only the evidence for COPD as it relates to emphysema and chronic bronchitis will be discussed.

Face validity

The incidence of COPD has been increasing in the last decade, 562 as have admissions for COPD. Admissions for COPD include exacerabations of COPD, respiratory failure, and rarely lung volume reduction surgery or lung transplantation. COPD accounts for over $20 billion in health care expenditures. Data suggests that more than 2/3 of these costs are incurred by only 10% of persons with COPD. 563

Practice guidelines for COPD have been set forth and published over the past decade. 564 Based on consensus, the three major guidelines (European Respiratory Society, American Thoracic Society and British Thoracic Society) agree that appropriate care for COPD includes spirometry, medication, and monitoring. All also agree that advice to quit smoking is a critical intervention. Specific recommendations for the indications of drug use vary depending on the nature of disease, though they often include bronchodialators (combined beta2-agonist and anticholinergic), theophylline and corticosteriods. These guidelines are based on consensus statements, not empiric evaluation of the evidence, and call for the need for more research on the effectiveness of treatments. 565

With appropriate outpatient treatment and compliance, hospitalizations for the exacerbations of COPD and decline in lung function should be minimized. COPD was selected by a physician panel as an ambulatory care sensitive condition. 275

Precision

COPD is a common disease that accounts for a substantial number of hospital admissions (>13% of admissions), 566 suggesting that reasonably precise area rate estimates are feasible. We were unable to identify studies that examined geographic variation in admission for COPD. Slight seasonal variations have been noted in younger age groups, though in patients over 65 no seasonal variations in hospital admissions have been noted. 567 Billings' original study from New York reported only 1.8-fold variation in COPD hospitalization rates, with a coefficient of variation of 0.742. 489

Minimum bias

Exacerbations of COPD and the rate of pulmonary function decline can be affected by patient characteristics. Cigarette smoking is a leading cause of COPD. Smoking after the development of COPD has been shown to accelerate the rate of pulmonary decline, with quitters sustaining less pulmonary decline that continuous smokers. 568 One study showed a small but significant increase in pulmonary decline in patients that quit smoking then restarted, compared with patients that never quit. 569 Thus, patient compliance is an important determinant of COPD admission rates. Actual smoking cessation by COPD patients is less than optimal, with many patients failing to comply despite ambulatory interventions. Even with intensive smoking cessation interventions, continuous quit rates remain at around 20%. 568 Lower rates have been reported in other studies with different patient populations, suggesting that patient characteristics may influence responsiveness to treatment.

Other factors have been associated with increased hospitalizations for COPD. For example, the association between lower socio-economic status and increased likelihood of hospital admission 570 may be due to higher occupational exposure to harmful substances, smoking rates, and infection rates postulated to exist among lower socio-economic groups. 570 However, Billings' original study from New York reported only 3% of variance explained by household income. 489 In addition, COPD is a progressive disease, and disease severity varies considerably across patients and over time. As lung function declines with time in COPD, older patients have higher rates of pulmonary decline, but heavier smoking and genetic factors also influence the rate of decline.

One environmental factor, daily increases in air pollution, has been associated with increased daily rates of COPD hospitalization.571-579 The robustness of the association, and the type and amount of air pollution affecting admissions, varies between study. Few studies have examined air pollution differences between geographical areas and its association with admission for COPD.

Thus, in addition to direct measures of disease severity, smoking status, age, and socio-economic factors may increase the likelihood of admission for COPD. These factors are candidates for use in risk-adjustment models.

One study did indicate that COPD in younger populations might include some miscoding for acute bronchitis. 567 It is unknown whether such coding differences lead to biases in area-level estimates of COPD admission rates.

Construct validity

The extent to which the admission rate for COPD relates to the quality of the outpatient health care provided has not been widely evaluated, though some studies have examined readmission rates. Weissman et al. did not evaluate COPD, but Bindman et al. reported that self-reported access to care explained 27% of the variation in COPD hospitalization rates (e.g., less than for asthma, CHF, or diabetes) at the zip code cluster level. 284 Millman et al. 491 reported that low-income zip codes had 5.8 times more COPD hospitalizations per capita than high-income zip codes in 11 states in 1988. However, Billings et al.'s 489 findings were weaker; low-income zip codes in New York City (where at least 60% of households earned less than $15,000 in 1988, based on adjusted 1980 Census data) had 1.8 times more COPD hospitalizations per capita than high-income zip codes (where less than 17.5% of households earned less than $15,000). Household income explained only 3% of the variation in COPD hospitalization rates at the zip code level.

Some articles discuss the adherence to practice guidelines and other therapies by both physicians and patients, though they do not generally examine the relationship between adherence rates and hospitalization rates for COPD. One study found varying rates of physician compliance with practices expected to improve quality for patients admitted for COPD. In this study, over 2/3 of physicians performed a complete history and physical examination and some nutritional screening. Almost all provided some sort of appropriate pharmacologic treatment (96%). However, few gave smoking cessation advice during the hospitalization (23%), and almost none provided appropriate discharge planning and education (.2%). 580 Another study found that often physicians did not adhere to practice guidelines. In this study, smoking cessation counseling was provided in 14.3% of ambulatory visits. Physicians prescribed medication contrary to indications. More than ¼ of visits resulted in a prescription of theophylline, though this drug is considered for use as a "step 3" drug, and 5% of visits resulted in a prescription for ipratropium, the first-line therapy. 581 No information was provided regarding the patients' severity of disease.

As noted above, patient compliance also influences the effectiveness of therapy. One study found patient compliance with inhalers ranged from 40%-60%. Patients reported higher compliance than that found by weighing canisters. Both measures may overestimate compliance, according to the authors. Patient compliance decreased over the 5-year follow-up period. 568 Other studies have found similar rates of non-compliance. 581

Fosters true quality improvement

One study examined COPD as one of three diseases possibly affected by access to care. Increased access to care in this randomized trial was associated with increased admission rates, possibly because of more detection of significant respiratory impairments in the community. 582 Thus, higher rates of COPD admission may in part reflect improvements in access to care, rather than deficient ambulatory care per se. However, this finding may also reflect a decline in the threshold for admission of "marginal" COPD cases in areas with greater access to care. It is possible that changes in coding practices, for example, coding patients as acute patients or omitting COPD codes, may reduce observed rates of COPD admissions. Recent investigations by the Medicare program into coding practices involving respiratory disease admissions raise the possibility that some COPD admissions reflect "upcoding." If the measure were used as an indicator, a decline in COPD admission rates may simply reflect a reverse change in coding practices.

Prior use

This measure was originally developed by Billings and colleagues in conjunction with the Ambulatory Care Project of the United Hospital Fund of New York, 275 and was subsequently adopted by the Institute of Medicine. 583 It has been widely used in a variety of studies of avoidable or preventable hospitalizations. At least 6 states (MA, NE, UT, VA, MI, NY) are reportedly using this set of measures "as guidance for policy and as evaluation and decision aids." 584 Note that COPD was not among the conditions identified by Weissman et al in 1992 as "avoidable hospital conditions." 276 A related indicator, admission for asthma, COPD, or pneumonia among patients with a prior diagnosis of COPD, was recently recommended as a measure of access to care for elderly Medicare beneficiaries. 585

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation324.0, 203.8 
   Systematic area-level standard deviation*0.10%High
   Area variation as a percentage of total variation*0.05%High
   Signal ratio*93.4%Very High
   R-Square*93.5%Very High
   * age- and gender- adjusted  
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)13.5%Good
   Relative impact:
     Rank correlation0.933Good
     Percent remaining in high decile/low decile68.2% / 86.4%Good
     Percent changing more than 2 deciles6.9%Good
Precision

This indicator is very precise, with a raw area level rate of 324.0 per 100,000 and a standard deviation of 203.8. The systematic area level standard deviation is high, at 0.10%. The area level variation also accounts for a high percentage of total variation, at 0.05%. This means that relative to other indicators, a higher percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is very high, at 93.4%. This means that it is very likely that the observed differences in area performance represent true differences in area performance. The very high R-square reflects the large proportion of signal that can be extracted using multivariate techniques. Such techniques do not have substantial additional impact, primarily due to the already very high ratio.

Bias

Signal variance decreases by over 15% with risk adjustment, indicating that some of the true variation among providers is due to differences in patient demographic characteristics. The indicator performs well on multiple measures of minimum bias. The rank correlation is good at 0.933. Risk adjustment appears to affect the high decile disproportionately to the low decile, as 68.2% of areas in the highest decile and 86.4% of the lowest decile remain after risk adjustment. The absolute magnitude of the impact is moderate, as is the relative impact.

Construct validity

COPD is related to most other ACSC indicators.

Discussion

COPD can often be controlled in an outpatient setting negating the need for admission. As COPD is a chronic progressive disorder, some rate of hospitalization is appropriate. Guidelines have been established and are not widely adhered to. However, it is unclear whether adherence to these guidelines actually reduces admissions. Patient compliance has been shown to be relatively low for this condition and may influence admission rates. Access to care explains 27% of the variation in COPD admission rates. However, another study found that household income did not substantially affect admission rates. The evidence for the validity of this indicator is equivocal. Areas may wish to examine the precipitating events to admission using means such as chart review, to understand more clearly whether admissions are due to poor quality care or other problems. Examination of processes of care in outpatient settings may also illuminate the extent to which COPD rates are due to poor quality care.

This indicator is measured with high precision, as would be expected by the high number of COPD admissions. The signal ratio for this indicator was particularly high, suggesting that the high variance noted is likely to reflect true differences in performance.

Our empirical analysis identified moderate bias when risk adjusting for age and sex. In particular risk adjustment appears to affect the areas with the highest rates the most. Our literature review pointed out several more factors that may influence the progression of the disease and thus the admission rate for the disease. These include smoking and SES, and are likely to vary by area. Clinical factors that may in turn be related to behavioral risk factors that vary by area. Risk adjustment for observable characteristics, such as smoking rates, is recommended. The extent to which the progression of the disease (and thus the development of certain clinical characteristics disposing hospitalization) can be slowed by proper outpatient care was beyond the scope of this project. However, such information may be particularly helpful in understanding the relationship between quality, bias and ACSC conditions.

The admission rate for COPD is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

This indicator is recommended for inclusion in the HCUP II QI set. It received an empirical score of 17 out of 26. It is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. As many factors influence COPD progression and hospitalization, additional risk adjustment for factors such as smoking rates may be desirable. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 24: ACSC: CONGESTIVE HEART FAILURE (CHF) ADMISSION RATE

IndicatorArea level admission rate for CHF.
Relationship to QualityProper outpatient treatment may reduce admissions for CHF. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for CHF per 100,000 population.
Outcome of InterestDischarges with ICD-9-CM principal diagnosis code for CHF in any diagnosis field (see Appendix 6)

Age 18-64 years.*

Exclude discharges with specified cardiac procedure codes (see Appendix 6) in any field.
Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

There are significant differences in physician management of patients with congestive heart failure, particularly depending on physician specialty (cardiologists vs. internists).586, 587 In the community- hospital setting, the clinical practices of cardiologists are more compatible with published treatment guidelines than the clinical practices of other physicians. The benefits of cardiology specialty care include lower CHF readmission rates and better post-discharge quality-of-life measures, rather than lower mortality rates, fewer hospital charges, or shorter length of stay. 588 Despite clinical trial evidence demonstrating the effectiveness of ACE inhibitors, the drug remains under-prescribed by most physicians. 589 There are significant differences in physician management of patients with congestive heart failure, particularly depending on physician specialty (cardiologists vs. internists).586, 587 Thus, it is plausible that such differences in community practices are reflected in differences in CHF admission rates. Because of the large numbers of patients with CHF and their substantial mortality, morbidity and cost of care, these differences may have a major impact on outcomes and health care costs.

Precision

Congestive heart failure is one of the leading diagnosis-related group (DRG) discharge diagnosis in the United States. 587 In the NIS, almost 3% of all discharges are for CHF, with roughly 200 discharges per hospital. Therefore, one can obtain relatively precise estimates of admission rates for CHF, although random variation may be important for small hospitals and rural areas. Billings' original study from New York reported up to 4.6-fold variation in CHF hospitalization rates, with a coefficient of variation of 0.646. 489

Minimum bias

Important determinants of patient outcomes with CHF include certain demographic variables (e.g., patient age), clinical measures (e.g., left ventricular ejection fraction and serum creatinine), management issues (e.g., documentation of left ventricular function and documentation of etiology of CHF), and treatment strategies (e.g., ancillary drug use). 589 These factors appear to be correlated with socioeconomic status. Billings' original study from New York found that 59% of the substantial cross-variance was associated with differences in household income. 489 Only limited evidence exists on the extent to which such factors, rather than access to and use of high-quality medical care, accounts for differences across areas.

Construct validity

Some evidence suggests that access to care influences CHF hospitalization rates. Weissman et al. found CHF hospitalization rates to be variably associated with lack of insurance (adjusted RR=1.17 and 1.81 in MA and MD, respectively) and Medicaid (adjusted RR=2.41 and 2.53 in MA and MD, respectively). 276 Bindman et al. reported that self-reported access to care explained 50% of the variation in CHF hospitalization rates (e.g., more than for any other condition) at the zip code cluster level. 284 Millman et al. 491 reported that low-income zip codes had 6.1 times more CHF hospitalizations per capita than high-income zip codes in 11 states in 1988. Billings et al. 489 found that low-income zip codes in New York City (where at least 60% of households earned less than $15,000 in 1988, based on adjusted 1980 Census data) had 4.6 times more CHF hospitalizations per capita than high-income zip codes (where less than 17.5% of households earned less than $15,000). Household income explained 59% of the variation in CHF hospitalization rates at the zip code level.

Fosters true quality improvement

Physician practice style varies across areas, but does not explain variation in admission rates for chronic medical conditions after adjusting for community sociodemographic factors. Outpatient interventions such as the use of protocols for ambulatory management of low-severity patients, and improvement of access to outpatient care, would most likely decrease inpatient admissions for CHF. 93 There is little evidence that lower rates of CHF admission would lead to worse patient outcomes. However, practice guidelines or utilization review intended to raise physicians' threshold for admission may not be effective in reducing hospitalizations for chronic medical conditions. 496

Prior use

This measure was originally developed by Billings and colleagues in conjunction with the Ambulatory Care Project of the United Hospital Fund of New York, 275 but a similar measure was developed contemporaneously by Weissman et al.. 276 This measure was subsequently adopted by the Institute of Medicine, 583 and has been widely used in a variety of studies of avoidable or preventable hospitalizations. At least 6 states (MA, NE, UT, VA, MI, NY) are reportedly using this set of measures "as guidance for policy and as evaluation and decision aids." 584 Internationally, CHF admissions are tracked by the United Kingdom as part of the UK National Health Service High Level Performance Indicators. 590 A related measure is included in the DEMPAQ 591 measure set. A closely related indicator, nonelective admission for CHF, was recently recommended as a measure of access to care for elderly Medicare beneficiaries. 585

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation521.0, 286.5 
   Systematic area-level standard deviation*0.14%High
   Area variation as a percentage of total variation*0.04%High
   Signal ratio*93.0%Very High
   R-Square*93.2%Very High
   * age- and gender- adjusted  
Minimum Bias - Age-sex risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)19.6%Good
   Relative impact:
     Rank correlation0.858Good
     Percent remaining in high decile/low decile54.5% / 81.8%Fair
     Percent changing more than 2 deciles16.6%Fair
Precision

This indicator is very precise, with a raw area level rate of 521.0 per 100,000 and a standard deviation of 286.5. The systematic area level standard deviation is high, at 0.14%. The area level variation also accounts for a high percentage of total variation, at 0.04%. This means that relative to other indicators, a higher percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is very high, at 93.0%. This means that it is very likely that the observed differences in area performance represent true differences in area performance. The very high R-square reflects the large proportion of signal that can be extracted using multivariate techniques. Such techniques do not have substantial additional impact, primarily due to the already very high ratio.

Bias

Signal variance decreased by over 15% with risk adjustment, indicating that some of the true variation among providers reflects differences in patient demographic characteristics. The indicator performs fairly to well on multiple measures of minimum bias. The rank correlation is good at 0.858. Risk adjustment does appear to affect both the extremes of the distribution substantially, with only 54.5% of providers in the highest decile remaining after risk adjustment and 81.8% in the lowest decile. Further, 16.6% of providers move more than 2 deciles after risk adjustment. The absolute magnitude of the impact is moderate.

Construct validity

CHF is related to most other ACSC conditions.

Discussion

Congestive heart failure can be controlled in an outpatient setting for the most part, however, the disease is a chronic progressive disorder for which some hospitalizations are appropriate. Our literature review found some evidence of face validity in that cardiologists have lower admission rates than general practitioners and ACE inhibitors are under-prescribed. Admission rates for CHF have been associated with lack of insurance, and access to care may account for as much as 50% of the variation in admission rates. As the causes for admissions may range from poor quality care, lack of patient compliance, or problems accessing care, areas may wish to review CHF patient records to identify precipitating causes and potential targets for intervention.

Our empirical analysis showed that at a provider level this indicator is measured with high precision, with high systematic variation. The signal ratio is very high suggesting that any observed differences are likely to reflect true provider performance. Multivariate techniques do not have much additional impact; as such either univariate or multivariate smoothing is recommended.

This indicator is subject to moderate bias. Our empirical analysis (adjusting for age and sex) found that relative provider performance did change somewhat. In particular risk adjustment appears to affect the areas with the highest and lowest raw rates. Our literature review noted that patient age, clinical measures such as heart function, and other management issues may affect admission rates. With the exception of age, few of these measures can be identified using administrative data. However, it is unclear which of these characteristics vary systematically by area, and account for bias. Areas with high rates may wish to examine the clinical characteristics of their patients to check for a more complex case mix. The extent to which the progression of the disease (and thus the development of certain clinical characteristics disposing hospitalization) can be slowed by proper outpatient care was beyond the scope of this project. However, such information may be particularly helpful in understanding the relationship between quality, bias and ACSC conditions. Some care of CHF complications occurs in emergency rooms, and would not appear in in-patient datasets. Examination of both emergency room data as well as inpatient data may give a more accurate picture of actual CHF complications rates.

Congestive heart failure admission rate is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

ACSC conditions typically vary with socioeconomic status. Examination of the SES status of an area's population, using estimates such as patient zip code or insurance status, may explain some of the area variation. However, SES is complexly related to poor access to care, so an area should not assume that none of the variance associated with SES is associated with poor access to care.

This indicator is recommended for inclusion in the HCUP II QI set. It received an empirical score of 14 out of 26. It is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 25: ACSC: DIABETES - SHORT TERM COMPLICATIONS ADMISSION RATE

IndicatorArea level admission rate for short term complications of diabetes.
Relationship to QualityProper outpatient treatment and adherence to care may reduce the incidence of diabetic short term complication.
As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for diabetic short term complications per 100,000 population.
Outcome of InterestNumber of discharges with ICD-9-CM principal diagnosis code for short-term complications (uncontrolled diabetes, ketoacidosis, hyperosmolarity, coma) per 100,000 population. (see Appendix 6)

Age 18-64 years.*

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Diabetic ketoacidosis, hyperosmolarity (HHNS), and coma are life-threatening complications of diabetes mellitus, particularly type 1 or insulin dependent diabetes mellitus (IDDM). Diabetic emergencies arise when there is an excess of glucose or insulin. The balance of insulin and glucose is kept by proper administration of insulin, and may involve other activities such as home blood-glucose monitoring. It has been noted in an adolescent and young adult population that better adherence to treatment (actual insulin intake vs. prescribed intake) is associated with fewer admissions for ketoacidosis and other complications. 592 Education programs for patients with diabetes have mixed results on reducing admissions for diabetic emergencies, though some have been shown to be effective. 593 It is important to note that intensive treatment (continuous insulin infusion pump, or multiple insulin injections daily) has beenassociated with more admissions for hypoglycemia. 594 Such intensive treatment has not been shown to have impact on admissions hyperglycemic events, but does reduce the incidence of long-term complications. Both hypoglycemic and hyperglycemic events are included in this indicator.

Minimum bias

Previously this indicator was defined with a hospital level denominator. Since some hospitals may be referral centers for diabetes, or may treat more difficult patients, some hospitals may have artificially high rates, though a quality problem is not present. This indicator has been redefined as an area-level measure. Some areas may have higher rates of diabetes, due to ethnic or age composition. It would be expected that these areas would have higher admission rates for diabetic emergencies. Other factors, such as illness,595-597 may also predispose patients to be admitted for diabetic emergencies. However, it is unlikely that any one area would experience significantly higher rates of these factors.

Admissions for diabetic emergencies can occur in both patients with existing and treated diabetes, as well as patients with previously unknown diabetes. One New Zealand study of 196 patients admitted for DKA found that 20% of admissions were new onset diabetes. 598 Two separate US studies of a US Urban African-American population found that 25% and 17% patients admitted for DKA were reportedly new onset diabetes.595, 596

Older age is associated with higher rates of underlying illness, more severe DKA, and better pre-hospitalization glycaemic control. This indicates that older patients may have fewer compliance issues and more complex cases. 597

Construct validity

Precipitating events leading to admission may include physiologic causes, as discussed above, or the cessation of treatment due to access to care or non-compliance issues. Evidence that such causes are or are not due to access to care contributes to the construct validity of this indicator. However, such evidence has not been strongly shown. Some studies outside the US, and a few inside the US have examined the precipitating events of admission for diabetic emergencies. These studies often rely on self-report, which may be a biased measurement in and of itself. Of patients with previously known and treated diabetes, over 60% had made an error in insulin administration or had omitted insulin. Few of these patients also had underlying illness. Further, 25% of the original patients were readmitted within the 18-month study period. This study has no indication whether or not these errors were due to non-compliance, poor education, or access to care problems. 598 A Scottish study of young adult patients found that 42% of DKA admissions were due to lack of adherence to insulin treatment. 597

In a potentially underserved population of Urban African-Americans, 2/3 of admissions were due to cessation of insulin therapy. Half of the patients stopping insulin treatment reported financial or other difficulties in obtaining insulin, while 21% reported inadequate understanding in adjusting dosages with food intake, and 14% were unsure about insulin management on sick days. Fourteen percent were clearly non-compliant. Most patients reported having been educated in diabetes care. 595 In a related study at a later date, 49% of patients with DKA, and 42% of patients with HHNS stopped or inadequately administered insulin prior the diabetic emergency. 596

Access to care in relation to admissions has been explicitly studied and reported. Weissman 276 found that uninsured patients had a higher risk of admission for DKA and coma than privately insured patients (adjusted O.R. 2.18 - 2.77). Bindman 284 reported that an area's self-rated access to care report explained 46% of the variance in admissions for diabetes, though the analysis was not restricted to diabetic emergencies.

Several studies, including Billings 275 and Pappas, 281 showed that residents of low-income communities have a higher risk of "ambulatory care sensitive" admissions, including short-term diabetic complications, than residents of high-income communities. Of course, this is only indirect evidence of validity, because low income and high income communities may differ for many reasons other than access to care. In addition, these studies aggregated ambulatory-care sensitive admission rates across multiple conditions, so they do not clearly support the validity of component measures, such as admission rates for short-term diabetic complications. Two studies of ACSC indicators reported validation work for diabetes independent of measure sets. Millman et al. 491 reported that low-income zip codes had 4.1 times more diabetes hospitalizations per capita than high-income zip codes in 11 states in 1988. Billings et al. 489 found that low-income zip codes in New York City (where at least 60% of households earned less than $15,000 in 1988, based on adjusted 1980 Census data) had 6.3 times more diabetes hospitalizations per capita than high-income zip codes (where less than 17.5% of households earned less than $15,000). Household income explained 52% of the variation in short term diabetes complication hospitalization rates at the zip code level.

Fosters true quality improvement

We found no evidence regarding the gaming of this indicator. Since diabetic emergencies are potentially life-threatening, it is unlikely that hospitals would fail to admit patients requiring hospitalization. Since this indicator is an area-level indicator, diversion to a nearby hospital is a non-issue.

Prior use

Admission for diabetic emergencies was included in both Billings 489 and Weissman's 276 sets of avoidable hospitalization measures. The indicator was also identified as a promising measure of quality and access by the DEMPAQ (Developing and Evaluating Performance Measures for Ambulatory Care Quality) 591 project, supported by the US Health Care Financing Administration, and the ACE (Access to Care for the Elderly) project, supported by the Physician Payment Review Commission. However, the denominator for these latter measures is limited to patients known to have diabetes, based on inpatient or outpatient claims diagnoses. This indicator, defined as a provider-level indicator, is currently an HCUP I indicator.

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation36.0. 24.6 
   Systematic area-level standard deviation*0.01%Moderate
   Area variation as a percentage of total variation*0.002%Moderate
   Signal ratio*51.7%Moderate
   R-Square*54.3%Moderate
   * age- and gender- adjusted  
Minimum Bias -age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)0.6%Very Good
   Relative impact:
     Rank correlation0.995Very Good
     Percent remaining in high decile/low decile100% / 100%Very Good
     Percent changing more than 2 deciles0.5%Very Good
Precision

This indicator is moderately precise, with a raw area level rate of 36 per 100,000 population and a standard deviation of 24.6. The systematic area level standard deviation is moderate, at 0.01%. The area level variation also accounts for a moderate amount of total variation, at 0.002%. This means that relative to other indicators, a lower percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is also moderate, at 51.7%. This means that it is likely that some of the observed differences in area performance do not represent true differences in area performance. Multivariate techniques do not appear to improve the amount of extractable signal, as is reflected by the moderate R-square.

Bias

Signal variance does not change with risk adjustment. The indicator performs very well on the multiple measures of minimum bias. The rank correlation is very good at 0.995, and risk adjustment does not appear to change the composition of providers in the highest and lowest decile. The absolute magnitude of the impact is also minimal.

Construct validity

Diabetes short term complications rate is related to most other ACSC indicators.

Discussion

The diabetic emergencies of DKA, coma and hypoglycemia arise from the imbalance of glucose and insulin. While diabetic emergencies typically arise from deviations in proper care, many emergencies occur when patients misadminister insulin or fail to follow a proper diet. Some of these instances may be attributed to lack of education or access to care problems, in addition to other reasons for non-compliance. Thus areas with high rates of diabetic emergencies may want to examine education practices, access to care and other potential causes of non-compliance when interpreting this indicator. Further information regarding precipitating events to admission may be gathered through chart review.

This indicator performed satisfactorily in the empirical analysis, and is measured with moderate precision. The systematic area level variation is moderate; this variation accounts for a moderate percentage of the total variation, relative to other indicators. The moderate signal ratio suggests that some of the observed differences between areas are not likely to reflect true differences in performance.

Risk adjustment with age and sex does not impact the relative or absolute performance of areas. Nonetheless, it is recommended that this indicator be risk adjusted, with age and sex. However, some areas may have higher rates of diabetes, due to different racial compositions and systematic differences in other risk factors. These areas may have a higher area rates, without actually having a higher proportion of individuals with diabetes developing short term complications. Risk adjustment for any of these observable characteristics, such as race, is recommended.

This indicator is somewhat unique in its definition. Factors noted in the literature review may aid users of this indicator in the interpretation of results. First, the combination of emergencies contained within this indicator, while all representing short-term complications, does represent both hyperglycemic and hypoglycemic events. Intensive therapy, used to prevent long-term complications of diabetes, actually increases hypoglycemic events, while neither increasing nor decreasing hyperglycemic events. Areas with high rates of intensive therapy may have a high rate of short-term complications arising primarily from hypoglycemic events. While intensive therapy is appropriate, high rates of hypoglycemic events with intensive treatment is not without some concern. However, areas may consider examining the rates of hyperglycemic versus hypoglycemic events, when interpreting this indicator.

Short-term diabetes admission rate is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare, and as such are defined on an area level. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

This indicator is recommended for inclusion in the HCUP II QI set. It received an empirical score of 14 out of 26. It is recommended with two potential caveats of use. First, as an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Second, rates of diabetes may vary systematically by area, creating bias for this indicator.

INDICATOR 26: ACSC: UNCONTROLLED DIABETES ADMISSION RATE

IndicatorArea level admission rate for uncontrolled diabetes.
Relationship to QualityProper outpatient treatment and adherence to care may reduce the incidence of uncontrolled diabetes. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for uncontrolled diabetes per 100,000 population.
Outcome of InterestNumber of discharges with ICD-9-CM principal diagnosis code for uncontrolled diabetes, without mention of a short-term or long-term complication, per 100,000 population. (see Appendix 6)

Age 18-64 years.*

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature

Healthy People 2010 5 has established a goal to reduce the hospitalization rate for uncontrolled diabetes in persons 18-64 years of age to 5.4 per 10,000 population. The current reported rate is 7.2 per 10,000 population. This measure corresponds closely with the measure of short-term diabetes developed by Billings et al. 489 and evaluated and recommended in this report. The key exception is the ICD-9 codes 25002 and 25003, uncontrolled diabetes. This indicator (uncontrolled diabetes) includes only these two codes.

This indicator (uncontrolled diabetes), is intended not as a stand alone indicator, but for use with the short-term diabetes indicator. Combining the two indicators will result in the Healthy People 2010 measure. As such, this indicator was not subjected to additional literature review, as for the most part, the literature review for short-term diabetes applies.

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation34.7. 28.1 
   Systematic area-level standard deviation*0.01%Moderate
   Area variation as a percentage of total variation*0.007%Moderate
   Signal ratio*72.6%High
   R-Square*74.2%High
   * age- and gender- adjusted  
Minimum Bias -age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)7.9%Good
   Relative impact:
     Rank correlation0.976Very Good
     Percent remaining in high decile/low decile77.3% / 90.9%Good
     Percent changing more than 2 deciles2.3%Very Good
Precision

This indicator is moderately precise, with a raw area level rate of 34.7 per 100,000 population and a standard deviation of 28.1. The systematic area level standard deviation is moderate, at 0.01%. The area level variation also accounts for a moderate amount of total variation, at 0.007%. This means that relative to other indicators, a lower percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is high, at 72.6%. This means that it is likely that the observed differences in area performance represent true differences in area performance, though some represents noise. Multivariate techniques do not appear to improve the amount of extractable signal, as reflected by the high R-square.

Bias

Signal variance does not change with risk adjustment. The indicator performs well on the multiple measures of minimum bias. The rank correlation is very good at 0.976. Risk adjustment changes the composition of providers in the highest and lowest decile moderately. The absolute magnitude of the impact is also moderate.

Construct validity

Diabetes short term complications rate is related to most other ACSC indicators.

Discussion

This indicator should be used in conjunction with indicator 25. See discussion for Indicator 25: Short Term Diabetes Complications.

This indicator is recommended for inclusion in the HCUP II QI set. It received an empirical score of 14 out of 26. It is recommended with several potential caveats of use. First, as an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Second, rates of diabetes may vary systematically by area, creating bias for this indicator. Finally, areas may reduce admission rates without improving quality, by shifting care to an outpatient setting.

INDICATOR 27: ACSC: DIABETES - LONG TERM COMPLICATIONS ADMISSION RATE

IndicatorArea level admission rate for long term complications of diabetes.
Relationship to QualityProper outpatient treatment and adherence to care may reduce the incidence of diabetic long term complications. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for diabetic long term complications per 100,000 population.
Outcome of InterestNumber of discharges with ICD-9-CM principal diagnosis code for long-term complications of diabetes (renal, eye, neurological, circulatory, or complications not otherwise specified) per 100,000 population. (see Appendix 6)

Age 18-64 years.*

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Over 10.5 million people in the U.S. have been diagnosed with diabetes mellitus, with over 90% of those having type 2 diabetes (NIDDM). 599 Long term complications occur in the majority of diabetic patients to some degree, and include retinopathy (mild to proliferative), neuropathy, nephropathy, and microvascular disorders. 599

Several observational studies have linked improved glycemic control to substantially lower risks of developing complications (retinopathy, neuropathy and nephropathy) in both Type 1 and Type 2 diabetes. 600 One study found that reducing glycosylated hemoglobin levels (a measure of glycemic control) has been estimated to produce over a two percentage point decrease in lifetime risk of blindness, 601 though the reduction is less for elderly patients. Another study of Type 2 diabetes reported a decrease of 60-100% in the incidence of retinopathy and macular edema, a 24%-50% decrease in gross protienuria, and a 16%-100% decrease in lower-extremity amputation, depending on the age of onset and insulin treatment. 602 Other studies have confirmed the benefit of tighter glycemic control on complications of NIDDM.603, 604 It has been recommended that near-normal glycemic control be maintained in both NIDDM and IDDM. 605

One mechanism of improving glycemic control is intensive therapy, which generally includes the use of a continuous insulin pump or multiple injections of insulin daily. The largest randomized control trial of intensive therapy for Type 1 or IDDM diabetes was The Diabetes Control and Complications Trial. Intensive therapy slowed the progression and decreased the development of retinopathy by 54% and 47% respectively. Also reduced were the incidences of microalbuminuria and clinical neuropathy (39% and 69%). 606 Fours years post trial, the differences between the intensive therapy group and the control group narrowed, but remained significant. 607

Given that appropriate adherence to therapy, and consistent monitoring of glycemic control, help to prevent complications, high-quality outpatient care should lower long-term complication rates. However, adherence to guidelines aimed at reducing complications (including eye and foot examinations, and diabetic education) has been described as modest608, 609 with only 1/3 of patients receiving all essential services. 610

Precision

Diabetes affects a large number of people, as do diabetic complications. Hospitalizations for amputations and other diabetic complications are not rare, 611 suggesting that reasonably precise estimates can be obtained. However, few studies have documented hospitalization rates for diabetic complications and the extent to which they vary across areas.

Minimum bias

It is possible that some sociodemographic characteristics of the population may lead to higher rates of long-term diabetic complications. Rates of diabetes are higher in Black, Hispanic and especially Native American populations than in other ethnic groups. Hyperglycemia appears to be particularly frequent among Hispanic and Native American individuals. 599 The duration of diabetes is positively associated with the development of complications. Since new-onset diabetes occurs more often in an elderly population, areas with older populations may have shorter durations of diabetes, and fewer long term complications. Though few studies have examined the validity of hospital diagnoses related to apparent long-term diabetic complications, one study found that administrative databases (including outpatient and lab data) had "disappointing" PPVs for long term complications (PPVs ranged from less than 50% to 88% for a three year period). 612 Whether such population differences and biases lead to substantial differences in diabetic complication rates across geographic areas is unclear.

Construct validity

As noted above, substantial evidence exists that compliance with treatment guidelines to prevent long-term complications of diabetes is low, and that long-term diabetic complications are common. However, the importance of problems in the quality of outpatient diabetes care in explaining variations in diabetic complication rates is less well understood. Compliance of physicians and patients is essential to achieve good outcomes, and it seems likely that problems with both access to and quality of care as well as patient compliance may contribute to the occurrence of complications.

Fosters true quality improvement

Little evidence exists on the impact of this quality improvement measure on the delivery of outpatient care for diabetes. Because the optimal hospitalization rate for this condition has not been defined, providers may decrease their rates by failing to hospitalize patients who would truly benefit from inpatient care. Although this concern cannot be dismissed, there is no published evidence of worse health outcomes in association with reduced hospitalization rates for long-term complications of diabetes. Such an effect seems implausible, given that only the most serious complications of diabetes are treated on an inpatient basis.

Prior use

This indicator, defined as a hospital-level indicator, is a current HCUP I indicator.

Empirical Evidence

TestStatisticRating
Precision
   Raw area level rate/standard deviation80.8, 58.1 
   Systematic area-level standard deviation*0.03%Moderate
   Area variation as a percentage of total variation*0.009%Moderate
   Signal ratio*75.6%High
   R-Square*76.6%High
   * age- and gender- adjusted  
Minimum Bias - age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)12.4%Good
   Relative impact:
     Rank correlation0.926Good
     Percent remaining in high decile/low decile74.7% / 90.9%Good
     Percent changing more than 2 deciles8.3%Good
Precision

This indicator is moderately precise, with a raw area level rate of 80.8 per 100,000 population and a standard deviation of 58.1. The systematic area level standard deviation is moderate, at 0.03%. The area level variation also accounts for a moderate amount of total variation, at 0.009%. This means that relative to other indicators, a lower percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is high, at 75.6%. This means that it is likely that the observed differences in area performance represent true differences in area performance, though some is due to random noise. The high R-square reflects the high amount of signal that can be extracted using multivariate methods, though this amount is less than for other indicators. Multivariate techniques do not appear to improve the amount of extractable signal, as is reflected by the moderate R-square.

Bias

The signal variance does not change with age-sex risk adjustment. The indicator performs well on the multiple measures of minimum bias. The rank correlation is good at 0.926. Risk adjustment does not appear to impact the lowest or highest decile disproportionately. The absolute magnitude of the impact is moderate, and 8.3% of areas change more than two relative deciles with risk adjustment.

Construct validity

Long term diabetes is related to the other ACSC conditions.

Discussion

Long term diabetes complications are thought to arise from sustained long-term poor control of diabetes. Intensive treatment programs have been shown to decrease the incidence of long-term complications in both type 1 and type 2 diabetes. However, it is unclear whether poor glycemic control arises from poor quality medical care, non-compliance of patients, lack of education, or access to care problems. Areas with high rates may wish to examine these factors when interpreting this indicator. Further information regarding precipitating events to admission may be gathered through chart review.

This indicator is measured with moderate precision, as shown in our empirical analysis. Given the high signal ratio, it is likely that the variation observed, reflects true differences in area performance, though some is due to random noise. Multivariate techniques do not have substantial additional impact. Therefore, either univariate or multivariate smoothing is recommended for this indicator.

Our analysis of minimum bias showed that risk adjusting by age and sex had a moderate effect of the performance of areas. The absolute impact was moderate, as well. Sociodemographic characteristics of the population, such as race, may bias the indicator, since there are higher rates of diabetes and poor glycemic control among Native Americans and Hispanic Americans. The importance of these factors as they relate to admission rates is unknown. Risk adjustment for observable characteristics, such as racial composition of the population, is recommended for this indicator. Outpatient clinics may also care for long-term complications of diabetes. Thus, examining both inpatient and outpatient data may give a more accurate picture of this indicator.

Diabetes long term complications admission rate is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare, and as such are defined on an area level. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

This indicator is recommended for inclusion in the HCUP II QI set. It received an empirical score of 11 out of 26. It is recommended with several potential caveats of use. First, as an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Second, rates of diabetes may vary systematically by area, creating bias for this indicator. Third, providers could reduce admission rates without improving quality of care by shifting care to an outpatient setting. Caution should be maintained for rates that are drastically below or above average rates.

INDICATOR 28: ACSC: HYPERTENSION ADMISSION RATE

IndicatorArea level admission rate for hypertension.
Relationship to QualityProper outpatient treatment may reduce admissions for hypertension. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for hypertension per 100,000 population.
Outcome of InterestDischarges with ICD-9 principal code for hypertension per 100,000 population. (see Appendix 6)

Age 18-64 years.*

Exclude discharges with specified cardiac procedure codes (see Appendix 6) in any field.
Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Hypertension is a chronic condition that is often controllable in an outpatient setting with appropriate use of drug therapy. We found little literature on hypertension admission as an ambulatory care sensitive condition indicator. Hypertension was originally included in both John Billings' 489 set of indicators developed for the United Hospital Fund of New York, and in Weissman's 276 set of indicators. These sets were developed by physician panels. Evidence on sets of ambulatory care sensitive condition indicators are summarized at the beginning of this section, and should be referred to for this indicator.

Precision

Although hypertension is a common condition, hospitalizations for complications of hypertension are relatively uncommon. One study noted that hypertension accounted for only 0.5% of total admissions for ACSCs. 282

Minimum bias

We found very little evidence on potential biases in this indicator. It is possible that the age structure of the population may affect admission rates for this condition. Weissman et al. reported a reduction of 100% in relative risk for Medicaid patients when adjusting for age and sex. 276 Though it seems plausible that differences in socioeconomic status and comorbid conditions such as obesity would affect population rates of hypertension and its complications, we found no evidence on the effects of comorbidities or other risk factors that may vary systematically by area on admission rates for hypertension complications in the area.

Construct validity

Two studies of ACSC conditions reported the results for hypertension independently. Bindman et al. found that an area's self rated access to care explained 22% of admissions for hypertension. 284 Weissman et al. found that uninsured patients had a relative risk of admission for hypertension of 2.38 in Massachusetts after adjustment for age and sex, while Maryland had a corresponding relative risk of 1.93. Medicaid patients also had somewhat elevated risks (adj. RR = 1.56, 1.74). 276 Millman et al. 491 reported that low-income zip codes had 7.6 times more hypertension hospitalizations per capita than high-income zip codes in the same 11 states in 1988.

Fosters true quality improvement

Little evidence exists on the impact of this quality improvement measure on the delivery of outpatient care for hypertension. Because the optimal hospitalization rate for this condition has not been defined, providers may decrease their rates by failing to hospitalize patients who would truly benefit from inpatient care. Although this concern cannot be dismissed, there is no published evidence of worse health outcomes in association with reduced hospitalization rates for hypertension. Such an effect seems implausible, given that only the most serious episodes of accelerated or malignant hypertension are treated on an inpatient basis.

Prior use

This measure was originally developed by Billings and colleagues in conjunction with the Ambulatory Care Project of the United Hospital Fund of New York, 275 and was subsequently adopted by the Institute of Medicine. 583 It has been widely used in a variety of studies of avoidable or preventable hospitalizations. At least 6 states (MA, NE, UT, VA, MI, NY) are reportedly using this set of measures "as guidance for policy and as evaluation and decision aids." 584 This indicator was also included in Weissman's set of avoidable hospitalizations. 276

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation37.1, 32.2 
   Systematic area-level standard deviation*0.01%Moderate
   Area variation as a percentage of total variation*0.006%Moderate
   Signal ratio*69.9%Moderate
   R-Square*71.2%High
   * age- and gender- adjusted  
Minimum Bias - age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)9.1%Very Good
   Relative impact:
     Rank correlation0.963Very Good
     Percent remaining in high decile/low decile72.7% / 100%Good/ V.G.
     Percent changing more than 2 deciles3.2%Very Good
Precision

This indicator is moderately precise, with a raw area level rate of 37.1 per 100,000 population and a substantial standard deviation of 32.2. The systematic area level standard deviation is moderate, at 0.01%. The area level variation accounts for only a moderate percentage of total variation, at 0.006%. This means that relative to other indicators, a lower percentage of the total variation occurs at the area level, rather than the discharge level. The signal ratio is moderate, at 69.9%. This means that it is likely that some of the observed differences in area performance do not represent true differences in area performance. The high R-square denotes the high amount of signal that can be extracted using multivariate methods, though this is less than for other indicators. Multivariate methods do not appear to have substantial additional impact.

Bias

Signal variance does not change with risk adjustment. The indicator performs well on multiple measures of minimum bias. The rank correlation is very good at 0.963. Risk adjustment appears to affect the highest decile disproportionately to the lowest decile, as 72.7% of areas in the highest decile and 100% of the lowest decile remain after risk adjustment. Few providers change more than 2 deciles in relative performance with risk adjustment. The absolute magnitude of the impact is minimal.

Construct validity

Hypertension is related to the other ACSC conditions.

Discussion

Hypertension is a common outpatient disorder, that can be effectively treated on an outpatient basis. Little evidence exists regarding the validity of this indicator. One study did relate admission rates to access to care problems.

Admission for hypertension is uncommon, suggesting that the indicator may be subject to some precision problems. However, our empirical results showed that this indicator is measured with adequate precision for use as a quality indicator. The high signal ratio (after multivariate smoothing) suggests that observed variation is likely to reflect true differences in quality of care, though some is also likely to reflect random noise. Multivariate techniques do not have substantial additional impact on the amount of extractable signal. As a result, either multivariate or univariate smoothing is recommended.

This indicator is subject to some minimal bias. Risk adjustment appears to impact providers with the highest rates the most, meaning that without risk adjustment, some providers may be misidentified as outliers. However, this bias is less substantial than it is for other indicators. Age and sex may be particularly important factors, and should be risk adjusted for. It is unknown how other clinical factors would impact this measure.

Hypertension is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 14 out of 26. This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 29: ACSC: LOWER EXTREMITY AMPUTATION RATE

IndicatorArea level admission rate for lower extremity amputation.
Relationship to QualityProper and continued treatment and glucose control may reduce the incidence of lower extremity amputation. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

* Rate can also be calculated for age 65 and older.

Method:

Quality MeasureAdmissions for lower extremity amputation in diabetics per 100,000 population.
Outcome of InterestDischarges with ICD-9 procedure code for LE amputation in any field and diagnosis code for diabetes in any field per 100,000 population. (see Appendix 6)

Age 18-64 years.*

Exclude discharges with trauma (see Appendix 6).
Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age 18-64 years.*

* Rate can also be calculated for age 65 and older.

Evidence from the literature
Face validity

Lower-extremity amputation is a common complication of diabetes, affecting up to 15% of all diabetics in their lifetimes. 613 In the United States, diabetes is the leading cause of nontraumatic amputations (approximately 57,000 per year). 614 While the full etiology of factors leading to amputation are unknown, it is believed that a combination of factors contribute to the high rate of amputation in the diabetic population. Neuropathy and the subsequent loss of sensation may lead to minor trauma to the feet. These lesions, including foot ulcers, may fail to heal due to poor circulation and other factors. Resulting infections may lead to gangrene.613, 615 Each of these singular causes can be prevented to a certain extent, leading to the prevention of lower extremity amputation. Possible interventions include foot clinics, wearing proper foot ware, and proper care of feet and foot ulcers. 615 Therefore, the American Diabetes Association (ADA) recommends that diabetics be educated in proper foot care, and that diabetics undergo a foot exam at least once a year. In addition, the Diabetes Control and Complications Trial (DCCT) 606 found that some of these factors, including neuropathy and microvascular disease, are preventable by maintaining blood glucose levels near normal using intensive insulin therapy. The DCCT has led to further recommendations that blood-glucose levels should be closely monitored in all diabetics (2-4 measurements per year), and glycosylated hemoglobin levels should be maintained below 8%.

There is substantial evidence that the ADA recommendations are not closely followed. One study of a California HMO noted that over half of patients did not have documented glycosylated hemoglobin levels (a measure of blood glucose control). In addition almost 40% of those patients with documented levels had at least one high level noted (over 10%), denoting overall poor glycemic control. Almost all patients had no documented foot exams, though these may have occurred without documentation. 616 In the median state in 1998-99, only 71% of Medicare beneficiaries had a glycosylated hemoglobin test within the previous year, while only 69% and 57% had an eye exam and a lipid profile, respectively, within the previous 2 years. 617 Several older studies, involving both Medicare618, 619 and Medicaid 620 recipients, showed similar deficiencies in processes of care that may help prevent amputations. In addition, the Medical Outcomes Study and the National Health and Nutrition Examination Survey 618 both showed that over 50% of diabetics are less than optimally controlled, and as many as 18.8% (in staff-model HMOs) to 32.4% (in fee-for-service care) may be at extremely high risk of complications, with a glycosylated hemoglobin of 12% or more. 621

Precision

Within the diabetic community, the incidence of lower extremity amputation has been reported as 375 per 100,000 person years for NIDDM, and 388 per 100,000 person years for IDDM. The twenty-five year cumulative risk for lower extremity amputation was 11%. 621

Although we located no studies discussing small-area variation of lower extremity amputation, worldwide numbers greatly vary from 2.8 per 100,000 population (Madrid, Spain) to 43.9 in the Navajo population in the United States. The other US study site, Montgomery, AL had an age adjusted incidence of 19.2 per 100,000. 622

Minimum bias

Several sociodemographic variables are associated with the risk of lower-extremity amputation in diabetics, including age, duration of diabetes, and sex.613, 623 Males have been found repeatedly to have higher risks of amputation (2.8-6.5 fold higher rates). 613 Age and sex distribution may vary systematically by area.

Race appears to be an important factor that may influence area rates. A study of hospital discharges in 1991 in California noted that African Americans had just under twice the rate of amputation as Whites (95.25 vs. 55.98 per 10,000 persons with diabetes, RR=1.72). Hispanics had a rate similar to that among Whites (44.43 per 10,000 persons with diabetes). 624 While minorities may have up to twice the amputation rates as whites, 613 it is unknown whether this association is due to differences in access to quality care, compliance, or biological risk factors. Another study of risk factors leading to amputation noted that in an insured HMO population, African-Americans did not have greater risk of amputation, 623 suggesting that the observed differences by race are due to access to care, and should not be adjusted for. However, rates of diabetes are uniformly higher in Black, Hispanic and especially Native American populations than in other ethnic groups. Hyperglycemia appears to be particularly frequent among Hispanic and Native American individuals. 599 Therefore, adjusting or stratifying by race may be advisable when this indicator is defined using all adults, rather than all adult diabetics, as the denominator population.

Two controlled studies have identified clinical risk factors for amputation among diabetics in an HMO 623 and in a VA medical center, 625 both settings with relatively good access to care. Selby and Zhang found that the level of glucose control, duration of diabetes, and baseline systolic blood pressure were major clinical predictors of amputation. Other diabetic complications, such as microvascular complications and history of stroke, were also predictive. 623 Reiber et al. controlled for sociodemographic factors, and found that sensory perception, circulation indices, and nutritional factors were associated with amputation risk. Patients with the most severe disease had a four-fold risk of amputation, compared to patients with less severe disease. 625 These studies suggest that some areas may have higher amputation rates than others, partially because of unmodifiable patient characteristics (e.g., duration of diabetes). However, many of the clinical factors associated with amputation are potentially modifiable in the long term, if excellent control of hyperglycemia and hypertension are maintained. Therefore, the magnitude of potential bias due to confounding depends somewhat on whether one takes a short-term or long-term perspective.

Construct validity

Several studies of intervention programs have noted a decrease in amputation risk. A 1989 prospective randomized study of a 1 hour foot care education program for high risk patients (patients with foot ulcers or previous amputation), noted a 3-fold greater risk of amputation 2 years after intervention in control patients. The education program informed patients of proper foot care and was supplemental to normal teaching regarding diet, exercise, weight and medication. 626 A more recent study noted a 1 year post-intervention decrease of 79% in amputations in a low-income African American population. The intervention varied by risk, with low risk patients receiving foot care education, and assistance in finding proper fitting footwear. High risk patients were provided with custom-molded orthoses and prescription footwear. Additional foot care was provided for patients with foot injuries. 627

One study examining the literature noted that provider and patient education may lead to a 72% decrease in amputations, multidisciplinary clinic care, a 47% decrease, and insurance coverage for therapeutic shoes a 53.5% decrease. They calculated the potential economic benefits for the first year to be over $1.1 million, $750,000, and $850,000 for educational interventions, multidisciplinary clinics and insurance coverage for footwear respectively. However, most of the benefit would occur in individuals 70 years or older. 628

One observational study of the risk factors of lower-extremity amputation found that patients who receive no outpatient diabetes education have a three-fold higher risk of amputation than those receiving care. 625 Although there is no clear evidence that areas with higher amputation rates provide worse care to diabetic patients, the evidence certainly suggests that high-quality care can substantially decrease amputation rates among diabetics.

Fosters true quality improvement

We located no evidence regarding the ability of this indicator to foster true improvement. It is unlikely, given the severity of conditions requiring lower-extremity amputations, that patients requiring amputation would be denied care.

Prior use

This indicator is not widely used. Healthy People 2010 5 has set a goal of reducing the number of lower extremity amputations from 4.1 per 1,000 persons with diabetes (in 1997) to 1.8 per 1,000 persons with diabetes. In addition this indicator is included in the DEMPAQ measure set for outpatient care.

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation30.5, 42.7 
   Systematic area-level standard deviation*0.04%Moderate
   Area variation as a percentage of total variation*0.001%Moderate
   Signal ratio*68.5%Moderate
   R-Square*70.2%High
   * age- and gender- adjusted  
Minimum Bias - age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)13.1%Good
   Relative impact:
     Rank correlation0.919Good
     Percent remaining in high decile/low decile59.1% / 90.9%Good
     Percent changing more than 2 deciles7.4%Good
Precision

This indicator is moderately precise, with a raw area level rate of 30.5 per 100,000 population and a substantial standard deviation of 42.7. The systematic area level standard deviation is moderate, at 0.04%. The area level variation accounts for only a moderate percentage of total variation, at 0.001%. This means that relative to other indicators, a lower percentage of the total variation occurs at the area level, rather than the discharge level. The signal ratio is moderate, at 68.5%. This means that it is likely that some of the observed differences in area performance do not represent true differences in area performance, though some also reflects random noise. The high R-square denotes the high amount of signal that can be extracted using multivariate methods, though this is less than for other indicators. Multivariate methods do not appear to have substantial additional impact.

Bias

Signal variance does not change with risk adjustment. The indicator performs well on multiple measures of minimum bias. The rank correlation is good at 0.919. Risk adjustment appears to affect the highest and lowest decile somewhat, as 59.1% of areas in the highest decile and 90.9% of the lowest decile remain after risk adjustment. The absolute magnitude of the impact is moderate.

Construct validity

Lower extremity amputation is slightly related to the other ACSC conditions, though it appears to be somewhat independent, as it loads on factor 2 more highly.

Discussion

Diabetes is a major risk factor for lower extremity risk factor. Infection, neuropathy and microvascular disease, are among the precipitating factors leading to lower-extremity amputation. Proper long term glucose control, diabetes education and foot care are just some of the interventions that have been implicated to reduce the incidence of these factors. Some observational studies have shown that high quality education and care can reduce lower extremity amputation, though no studies have reported that low quality care is associated with increased lower extremity amputation rates.

This indicator is measured with moderate precision, as shown in our empirical analysis. Given the high signal ratio (using multivariate smoothing techniques), it is likely that the variation observed, reflects true differences in area performance, though some is due to random noise. Multivariate techniques do not have substantial additional impact. Therefore, either univariate or multivariate smoothing is recommended for this indicator.

Studies have shown that lower extremity amputation varies with age and sex. Our analysis of minimum bias confirmed risk adjusting by age and sex had a moderate effect on the relative performance of areas. The absolute impact was moderate, as well. Sociodemographic characteristics of the population, such as race, may bias the indicator, since there are higher rates of diabetes and poor glycemic control among Native Americans and Hispanic Americans. However, poor quality care may also vary systematically with racial composition of the population. Therefore, it is important when adjusting for race to interpret the results with caution. Clinical risk factors such as progression of disease also affect LE amputation risk. However, the decision to include these in a risk adjustment model primarily depends on the decision to take a long-term or short-term perspective, as progression of disease my be prevented through high quality care. Risk adjustment for observable characteristics, such as racial composition of the population, is recommended for this indicator.

The admission rate for lower extremity amputation in diabetics is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare, and as such are defined on an area level. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

The Healthy People 2010 goal to reduce lower extremity amputation is defined with a denominator of only diabetics. The proposed indicator has a denominator of total population, as data on diabetes rates in a population is not as readily available, as census data. Nonetheless, areas with data on overall diabetes rates in the MSA or county may wish to consider this indicator in context of these rates.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended that it be used in conjunction with other ACSC indicators.. It received an empirical rating of 10 out of 26, and smoothing is recommended. This indicator is recommended with two caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Further, this indicator has unclear construct validity, as this indicator has not been validated except as part of a set of indicators.

INDICATOR 30: ACSC: LOW BIRTH WEIGHT RATE

IndicatorArea level low birthweight rate.
Relationship to QualityProper preventative care may reduce the incidence of low birthweight. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of low birth weight infants per 100 births.
Outcome of InterestNumber of births with ICD-9 diagnosis code for birthweight less than 2500 grams per 100 births within area. (see Appendix 6)

Exclude transfer from other institution.
Population at RiskAll births (discharges in MDC 15 - newborns and neonates) in MSA or county.
Evidence from the literature
Face validity

Infants may be low birth weight due to inadequate interuterine growth or premature birth. Once premature labor has commenced, it is often quite difficult to stave the progression for any significant amount of time. However, risk factors for low birth weight may be addressed with adequate prenatal care and education. Risk factors include many sociodemographic and behavioral characteristics, such as low income and tobacco use during pregnancy. Prenatal education and care programs have been established to help reduce low birth weight and other complications in high risk populations. Addressable risks include: maternal undernutrition, genital tract infections, excessive physical exertion or stress, psychological stress and adverse health habits such as nicotine, alcohol or illicit drug exposure or poor prenatal care. 629 Nonetheless, evidence of the effectiveness of these programs has been equivocal. It is unclear whether increasing prenatal care or education actually reduces low birth weights. Healthy People 2010 has set a goal to reduce the percentage of low birth weight infants to .9%. 5

Precision

Although low birth-weight births account for only a small fraction of total births, the large number of births suggest that this indicator should be precisely measurable for most areas.

Minimum bias

Of the risk factors for low birth weight, very few are related directly to patient care. It is unclear how many of the risk factors are in actuality indirect measures to problems accessing care. Socioeconomic measures such as parental education and income have been shown to be negatively associated with rates of low birth weight infants.630, 631 Demographic factors such as age and race also appear important, and may be correlated with socioeconomic factors. Very young mothers (under 17 years) and older mothers (over 35 years) are at a higher risk of having low-birth weight infants.630, 631 Other factors such as tobacco use, primiparity, complications of pregnancy or labor and delivery, and marital status have also been cited as risk factors for low birth weight. 630

Black race has been repeatedly shown to be a risk factor for low birth weight. Many studies have attempted to control for the many confounding risk factors discussed above. One study of all California singleton births in 1992 found that after risk adjustment having a black mother remained a significant risk factor (Adj. OR = 1.6). 630 In an attempt to delineate whether maternal birthplace affected low birth weight among black and other minority groups. David et al. 632 examined the relative risks of black mothers in Illinois born in the US, West Africa, or white mothers born in the US. They found that African born women had a 50% increase in relative risk as compared to white mothers, and US born black mothers had a 100% increase, when matching cases with respect to age, marital status, education and spouses education, prenatal care, parity and previous prenatal loss. Another study that examined California births in 1992 found no difference between foreign born and US born black women when adjusting for maternal and infant characteristics. 633 As this study adjusted for more characteristics than the David et al. study, the difference found in the David et al. study may be due to such bias. This study did find a difference between Latina US born and foreign born women, with US born women having higher rates of low-birth weight infants, even after adjustment for maternal and infant characteristics.

Low births weight also varies systematically by metropolitan versus non-metropolitan areas. An analysis of 11 million births in all 50 states from 1985-1987 found that mothers residing in non-metropolitan counties more likely to have low birthweight infants than those residing in metropolitan counties, before risk adjustment. However, after adjusting for maternal race, age less than 18, age over 35, nulliparity, parity greater than 4, single marital status, completion of high school and to a limited extent late prenatal care (this was considered an outcome as well), there were no longer statistical differences between metropolitan and non-metropolitan areas. This suggests that this indicator at an area-level could be potentially biased. 634

The interrelationship between all these risk factors and prenatal care and complications are very complex and have not been studied in a large sample. One study of women in Baltimore noted that factors of potential social stress, such as crime rate or unemployment rate, may interact with other risk factors, creating a complicated web of risk factors. 631 Indeed, the picture of which factors lead to low birth weight is unclear because of these complex relationships.

Little evidence exists on the extent to which each of these factors contributes to differences in the rate of low birthweight births across geographic areas.

Construct validity

A number of studies have addressed the impact of prenatal care, or level of prenatal care among low birth weight babies. One study examined birth records in California to establish the relative risk of having a low birth weight baby as a function of use of prenatal care. Those with inadequate care (as calculated by Kotelchuck's Adequacy of Prenatal Care Use Index) had a 3.68 adjusted odds ratio of having a low birth weight infant, adjusting for maternal and infant characteristics as well as insurance status. It is important to note that those with more than adequate care had an adjusted odds ration of 6.78, with the same adjustment. These are likely to be high-risk patients that were followed closely and suggests that some of relevant risk adjustment factors were not included in this model. 630

One randomized control trial examined the effect of prenatal care on reducing low birth weight rates in low risk women (women with past high-risk obstetrical complications, current high risk conditions, or significant comorbidities). Reducing prenatal care from 14 visits to 9 visits did not affect the rate of low birth weight infants. However, there is no indication of the socioeconomic profile of the participants and their non-clinical risk factors. 635

Finally, one observational study, difference in the use of prenatal care accounted for less than 15% of the differences between low birth weight in black and white mothers enrolled in Kaiser-Permanente. However, increasing level of prenatal care was associated with lower rates of low birth weight, particularly in the black patient population. 636

One review of studies evaluating programs to reduce low birth weight notes that the studies have often been poorly designed, and this lack of rigor may account for some of the equivocal findings. The authors argue that prevention programs aimed at one specific risk factor in a population shown to have high rates of that factor do reduce low-birth rates, while comprehensive programs in potentially high risk populations do not reduce low-birth weight. They argue that this is potential evidence that prevention programs are simply misapplied and have poor designs. 629 Thus, while specific studies have demonstrated an impact of particular interventions, especially in high-risk populations, evidence on the impact of better prenatal care on low birthweight rates for area populations is less well developed.

Fosters true quality improvement

It seems unlikely that use of this indicator could lead to apparent reductions in the rate of low birthweight births that did not represent true reductions.

Prior use

Low birth weight has been used as a quality indicator on a limited basis, though interest in preventing low birth weight has been demonstrated through the literature and implementations of prevention programs. Low birth weight is a indicator in the HEDIS measure set for insurance groups, and is used by United Health Care and the University Hospital Consortium. The previous version of the HCUP I indicator set included both low birth weight and very low birth weight indicators. Healthy People 2010 has set a goal to reduce the percentage of low birth weight infants to .9%. 5

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation3.9%, 2.3% 
   Systematic area-level standard deviation*1.18%Very High
   Area variation as a percentage of total variation*0.27%Very High
   Signal ratio*67.1%Moderate
   R-Square*81.2%High
   * gender- adjusted only  
Minimum Bias - APR-DRG risk adjustment Not applicable 
Precision

This indicator is precise, with a raw area level rate of 3.9% and a standard deviation of 2.3%. The systematic area level standard deviation is very high, at 1.18%. The area level variation accounts for a very high percentage of total variation, at 0.27%. This means that relative to other indicators, a higher percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is moderate, at 67.1%. This means that it is likely that some of the observed differences in area performance do not represent true differences in area performance. However, the high R-square reflects the high amount of signal that can be extracted using multivariate techniques, though this is still lower than for other indicators.

Bias

No analyses were conducted, since all newborns are the same age, and linkage to maternal records is not available.

Construct validity

Low birth weight inversely related to the other ASCS indicators. It is positively related to perforated appendix rate.

Discussion

Low birth weight has been implicated as an indicator of access to prenatal care. Healthy People 2010 has set a goal to reduce the percentage of low birth weight infants to .9%. However, this indicator has unclear face validity, as noted in our literature review. While mothers who give birth to low-birth weight infants generally receive less prenatal care than others and inadequate prenatal care persists as a risk factor for low birth weight when adjusting for potential confounds, comprehensive care programs in high risk women have failed to reduce low birth weight rates. It is unclear what impact the health care system has on low-birth weight. Nevertheless, potentially, and in some studies, specific counseling aimed at reducing a specific risk factor in a population identified to have that risk factor may have some impact on reducing low birth weight rates. One method of targeting populations would be to identify those hospitals contributing the most to the overall area rate. The populations served by those hospitals may be a starting place for interventions. Examination of processes of prenatal care may also help illuminate potential problem areas.

While the face validity of this indicator remains unclear, this indicator did perform well in our tests of precision. The area level standard deviation is very high. Using multivariate techniques increases the amount of signal that can be extracted for this indicator, and thus such techniques are highly recommended.

We were unable to adequately risk adjust using the data available. Adequate risk adjustment may require linkage to birth records, which record many of the sociodemographic and behavioral risk factors noted in the literature review (race, age, drug use, stress). Areas with high rates may wish to examine the prevalence of these risk factors in the population. However, some "risk factors," while not being a direct indication of quality of care, may suggest areas of potential interventions that may reduce low-birth weight rates if implemented properly. Other "risk factors" may require some form of risk adjustment. Where risk adjustment is not possible, considering results in light of measures of SES (as determined by insurance status or patient zip code), or other factors that may provide some guidance as to "case mix" in the area. However, the relationship between potentially preventable risk factors and SES is complex and as such providers should not assume that all variance associated with SES is due to factors that cannot be influenced to reduce low birth weight. Birth records in some states are a rich source of information that could help to identify causes of low birthweight and help delineate potential areas of intervention.

Low birth weight is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended that it be used in conjunction with other ACSC indicators. It received an empirical rating of 11 out of 16 (bias could not be evaluated for this indicator). This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Further, this indicator could have substantial bias that would require additional risk adjustment from birth records or clinical data. Finally, this indicator has unclear construct validity, as this indicator has not been validated except as part of a set of indicators.

INDICATOR 31: ACSC: PEDIATRIC ASTHMA ADMISSION RATE

IndicatorArea level admission rate for pediatric asthma.
Relationship to QualityProper outpatient treatment may reduce admissions for asthma is the pediatric population. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureAdmissions for pediatric asthma per 100,000 population.
Outcome of InterestNumber of discharges with ICD-9-CM principal diagnosis code for asthma per 100,000 population (see Appendix 6).

Age less than 18 years.

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age less than 18 years.
Evidence from the literature
Face validity

Asthma is the most common chronic disease in childhood and is one of the most frequent admitting diagnoses in children's hospitals.637, 638 In the United States, asthma affects an estimated 4.8 million children and adolescents, and in 1993 it was the cause of 198,000 admissions and 342 deaths in persons aged 24 years and younger. 637 There are effective ambulatory treatments for asthma as well as guidelines published by the National Heart, Lung, and Blood Institute, which are endorsed by the American Academy of Pediatrics. 520 These guidelines emphasize the importance of patients' access to care, appropriate diagnosis of asthma, establishment of a physician-patient relationship, timely management of asthma symptoms with appropriate medications, appropriate prophylactic and maintenance therapy, and adequate follow-up care. Healthy People 2010 has set a goal toreduce the admission rate for asthma to 25 per 10,000 population for children under 5 years, and 7.7 per 10,000 population for people aged 5-65 years. 5

Precision

Because asthma is one of the most common reasons for pediatric hospitalization, with an average rate of 28.0 per 10,000 children less than 15 years of age, 637 relatively precise estimates of asthma admission across areas or hospitals can be obtained. Admission rates for asthma tend to be higher during peak times of viral respiratory infections (Winter) and allergy seasons (Spring and Fall), so care must be taken to ensure a consistent time period for measurement. There is wide variation across areas in admission rates for asthma, so random variation from year to year may be important for less populated areas

Minimum bias

Some admissions with asthma are unavoidable and appropriate. For example, some children have especially severe disease due to genetic factors, associated cystic fibrosis or bronchopulmonary dysplasia, or increased exposure to environmental triggers.275, 560, 637, 639-641 Indoor allergens such as cockroaches and dust mites may be more common in lower-income areas, and are probably associated with increased frequency and severity of asthma symptoms 523 Tobacco smoke is the most important indoor irritant and is a major precipitant of asthma symptoms in both children and adults.520, 525-527, 529 Exposure to maternal tobacco smoke is a risk factor for the development of asthma in infancy 642 and childhood,643-649 although not for persistence of childhood asthma into adulthood. 650

Outdoor air pollution, especially respirable particulates, may also play a role.520, 528-532 In addition, ozone and SO2 have been associated with increased emergency department visits and hospitalizations rates.531-538 Increasing air pollution has been specifically correlated with higher admission rates in London (Ozone, NO2, SO2, and black smoke), 539 and Seattle (ambient air pollution) 540 .

Race represents one of the most complex potentially biasing factors for this indicator. Black patients have consistently been shown to have higher asthma admission rates,544-546 even when stratifying for income and age. 547 One study examining differences in asthma health care utilization noted that African Americans made fewer asthma-related primary care and specialist visits than Caucasian patients (47.6% vs. 70.2% and 27% vs. 38.8%). There were no differences in hospitalization rates by race, but African-American patients had lower household incomes and made more emergency department visits (proxy for either access to care or asthma severity). 548 Similarly, Hispanics have been shown to have higher admission rates than non-Hispanic whites (or areas with higher percentages of Hispanics have been shown to have higher admission rates), although none of these studies controls for SES.545, 549, 550 To the extent that true differences in disease prevalence or severity are responsible for racial variation in hospitalization rates, race should be adjusted for in comparing asthma hospitalization rates across areas. On the other hand, to the extent that minority patients have less access to care or poorer quality of outpatient care, race should not be adjusted for.

Construct validity

Little evidence has been reported conclusively attaching poor quality of care to higher area admission rates. However, numerous studies have shown that asthma hospitalization rates are associated with socioeconomic factors, including median household income (at the area level) and lack of insurance (at the individual level). A study of asthma hospitalization rates in California in 1993 (ages 0-64) found that areas with median household incomes under $35,000 had hospitalization rates that were 1.5 times higher than areas with higher median incomes. 547 In Boston, in 1992, age and gender standardized hospitalization rates (all ages) were correlated with percentage poverty in an area (r=0.68), percentage holding a bachelor's degree (r=-0.61), and income (r=-0.51). 550 Within New York City in 1994, asthma hospitalization rates were negatively correlated with a zip code area's median household income (r=-0.67), and positively correlated with the percentage of minorities in the population (r=0.82). 549 These findings confirm an earlier study by Billings et al., 489 who reported 6.4-fold variation in asthma hospitalization rates at the zip code level in New York City in 1988, with 70% of this variation explainable by the percentage of households with annual income below $15,000. Millman et al. 491 reported that low-income zip codes had 5.8 times more asthma hospitalizations per capita than high-income zip codes in 11 states in 1988. Using New York State data, Lin et al showed that hospitalization rates were higher in areas with higher poverty, unemployment, minority populations, and lower education levels. 545 Even in England, 45% of the variation in asthma hospitalization rates across 90 family health services authorities in 1990-95 was attributable to socioeconomic factors, plus the availability of secondary care. 551 To our knowledge, only one study has reported partial correlations; 552 it found that that in New York City, the percentage of African-American residents was the strongest predictor, and median household income was the next strongest predictor, of asthma hospitalization rates.

The observation that asthma admission rates are higher in areas with low SES has led some researchers to hypothesize that lack of access to care, or poor quality outpatient care, may lead to higher admission rates. Bindman et al. 284 showed that asthma hospital?ization rates across 41 sampled areas in California were significantly correlated (r=0.47) with self-rated access to needed medical care, according to community telephone surveys. Although analyses of the National Health and Nutrition Examination Survey found that Medicaid enrollment and Spanish language preference were associated with inadequate asthma therapy, these deficiencies in care were not directly linked to hospitalizations. 554 Studies from other settings have shown that African-American asthmatics tend to have fewer scheduled primary care visits, and more hospitalizations and emergency room visits, than White asthmatics.555, 556 African-Americans' use of asthma medications may also be less consistent with current practice guidelines. 557

Few studies have directly linked high-quality processes of outpatient care with lower hospitalization rates at either the area or the individual level. An in-depth study of asthma treatment practices in New Haven, Boston, and Rochester found that the community with the highest asthma hospitalization rate (Boston) also had lower use of inhaled antiinflammatory agents and oral steroids. The threshold for admission also appeared to be lower in Boston, as fewer of the admitted patients were hypoxemic, relative to the other cities. 560 One case control study from a large health maintenance organization established that not having a written asthma management plan was a strong risk factor for asthma hospitalization (after adjusting for severity of asthma), but the use of anti-inflammatory medications was not. 561 With patient and parent education, good medical therapy, and outreach programs, adverse outcomes can be reduced considerably.561, 651

Fosters true quality improvement

Because the optimal hospitalization rate for this condition has not been defined, providers may decrease their rates by failing to hospitalize patients who would truly benefit from inpatient care, or by hospitalizing marginally appropriate patients with other conditions (to inflate the denominator). Although these concerns cannot be dismissed, there is no published evidence of worse health outcomes in association with reduced hospitalization rates for asthma. Indeed, given studies showing high rates of inappropriate hospitalization and poor adherence to professional guidelines, a shift to outpatient care may be entirely appropriate. 652 This is an area that should be further studied.

Prior use

This measure was originally developed by Billings and colleagues in conjunction with the Ambulatory Care Project of the United Hospital Fund of New York, 275 but a similar measure was developed contemporaneously by Weissman et al. 276 It was subsequently adopted by the Institute of Medicine, 583 and has been widely used in a variety of studies of avoidable or preventable hospitalizations. At least 6 states (MA, NE, UT, VA, MI, NY) are reportedly using this set of measures "as guidance for policy and as evaluation and decision aids. 276 The measure was developed to include all ages, but has since been adapted by Gadomski 277 and McConnochie 287 as a pediatric measure. South Carolina included asthma in the set of ambulatory care sensitive measures for pediatrics used by the South Carolina Department of Health Statistics. 278 Healthy People 2010 has set a goal to reduce the admission rate for asthma to 25 per 10,000 population for children under 5 years, and 7.7 per 10,000 population for people aged 5-65 years. 5

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation154.1, 143.9 
   Systematic area-level standard deviation*0.11%High
   Area variation as a percentage of total variation*0.005%High
   Signal ratio*85.1%High
   R-Square*85.6%High
   * age- and gender- adjusted  
Minimum Bias -age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)5.3%Very Good
   Relative impact:
     Rank correlation0.994Very Good
     Percent remaining in high decile/low decile100% / 95.5%Very Good
     Percent changing more than 2 deciles0.0%Very Good
Precision

This indicator is precise, with a raw area level rate of 154.1 and a standard deviation of 143.9. The systematic area level standard deviation is high, at 0.11%. The area level variation also accounts for a high percentage of total variation, at 0.005%. This means that relative to other indicators, a higher percentage of the variation occurs at the area level, rather than the discharge level, though still lower than for other indicators. The signal ratio is high, at 85.1%. This means that it is likely that the observed differences in area performance reflect true differences in performance, though some reflects random noise. The high R-square reflects the high proportion of signal that can be extracted using multivariate methods, though lower than for other indicators. Multivariate techniques have little additional impact.

Bias

The signal variance does not change with risk adjustment. The indicator performs very well on multiple measures of minimum bias. Risk adjustment does not appear to affect the extremes of the distribution substantially. The rank correlation is very good at 0.994. No of areas move more than 2 deciles in relative performance. The absolute magnitude of the impact is minimal.

Construct validity

Pediatric asthma is related to most other ACSC indicators.

Discussion

Pediatric asthma is a chronic disease with relatively easy treatment. This indicator is related to a Healthy People 2010 goal to reduce admissions to 25 per 10,000 population age less than 5 years, and 7.7 per 10,000 population age 5-65 years. It has been noted that adherence to the guidelines for asthma management has been associated with lower admission rates, and that 71% of variance in admissions can be explained by household income.

This indicator performed well on our tests of precision. It is measured with high precision, and its signal ratio is high, suggesting that the observed variance does reflect true differences in performance. Multivariate techniques do not appear to have substantial additional impact for this indicator, and as such, either multivariate or univariate smoothing is recommended.

This indicator does not appear to be substantially biased. Risk adjustment does not appear to effect the extremes of the distribution, suggesting that without risk adjustment, areas are not likely to be mislabeled as outliers, assuming that age and sex adjustment is adequate. Our literature review noted that some children may be at risk for admission due to comorbidities, genetic factors, and environmental triggers. It is unclear which of these factors would vary by area, nor the impact of parental compliance, which may vary systematically by area.

Pediatric asthma is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 18 out of 26. This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

INDICATOR 32: ACSC: PEDIATRIC GASTROENTERITIS ADMISSION RATE

IndicatorArea level admission rate for pediatric gastroenteritis.
Relationship to QualityProper outpatient treatment may reduce admissions for gastroenteritis in the pediatric population. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureAdmissions for dehydration per 100,000 population.
Outcome of InterestDischarges with ICD-9-CM principal diagnosis code for gastroenteritis (see Appendix 6).

Age less than 18 years.

Exclude transfer from other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Population at RiskPopulation in MSA or county, age less than 18 years.
Evidence from the literature
Face validity

Gastroenteritis is a common illness in childhood, resulting in nearly 200,000 hospitalizations annually (nearly 10% of all admissions of children under 5 years of age). 653 There are effective ambulatory treatments for gastroenteritis and clear guidelines published both by the Centers for Disease Control and the American Academy of Pediatrics. 653 These guidelines emphasize the importance of appropriate oral rehydration therapy for mild to moderate dehydration resulting from gastroenteritis, to avoid the need for hospitalization. Adherence to these guidelines is poor; only about 33% of children treated for gastroenteritis received glucose-electrolyte solutions recommended by the AAP. Most physicians instead recommend clear liquids, which have been shown to be ineffective in treating dehydration due to gastroenteritis,654-656 and withholding solid food longer than the 24 hours recommended by the AAP. A physician panel agreed that timely and effective ambulatory care would reduce the risk of hospitalization for gastroenteritis. 275

Precision

Because gastroenteritis is one of the most common reasons for pediatric hospitalization, with small area rates of 200-400 per 100,000 children, relatively precise estimates of gastroenteritis admission across areas or hospitals can be obtained. Gastroenteritis is known to vary seasonally, with about half of hospitalizations occurring between February and April. 657 Thus the stability of the measure over time will be influenced by seasonal fluctuation in disease prevalence, so care must be taken to ensure a consistent time period for measurement. There is wide variation across areas in admission rates for gastroenteritis (14- to 18-fold differences), 657 so random variation in a particular year may be considerable for less populated areas and smaller hospitals.

Minimum bias

Some admissions with gastroenteritis are unavoidable and appropriate. For example, some children with gastroenteritis also suffer from a chronic disease, or from another infection such as gingivostomatitis or tonsillitis, that inhibits oral intake of liquids. 652 These "mandatory admissions" may be difficult, if not impossible, to identify from HCUP data. However, most (73%) children admitted with gastroenteritis appear to have no underlying problems, and most (79%) are re-hydrated within 12 hours. One study suggests that complicated gastroenteritis admissions may be more common among children of low socioeconomic status. 640 If true, this finding suggests that clinical characteristics may explain some of the observed variation in admission rates for pediatric gastroenteritis.

Construct validity

No published studies have specifically addressed the construct validity of this indicator. Billings' original study from New York reported 1.87-fold variation in gastroenteritis hospitalization rates, with a coefficient of variation of 0.438 and 22% of variance explained by household income. 489 Millman et al. 491 reported that low-income zip codes had 1.9 times more pediatric gastroenteritis hospitalizations per capita than high-income zip codes in the same 11 states in 1988.

Fosters true quality improvement

Because the optimal hospitalization rate for this condition has not been defined, providers may decrease their rates by failing to hospitalize patients who would truly benefit from inpatient care, or by hospitalizing marginally appropriate patients with other conditions (to inflate the denominator). Although these concerns cannot be dismissed, there is no published evidence of worse health outcomes in association with reduced hospitalization rates for gastroenteritis. Indeed, given studies showing high rates of inappropriate hospitalization and poor adherence to professional guidelines, a shift to outpatient care may be entirely appropriate. 652 This is an area that should be further studied.

One evaluation of an intervention to improve access reported specifically on pediatric gastroenteritis. Kaestner et al. found no narrowing of the differences in "discretionary" infant (<2 year) hospitalization rates between low, middle, and high-income zip codes, during a period of substantial Medicaid eligibility expansion (1988-1992). 503 Disaggregation of gastroenteritis hospitalizations did not alter this finding.

Prior use

This measure was originally developed by Billings and colleagues in conjunction with the Ambulatory Care Project of the United Hospital Fund of New York. 275 It was subsequently adopted by the Institute of Medicine, 583 and has been widely used in a variety of studies of avoidable or preventable hospitalizations. At least 6 states (MA, NE, UT, VA, MI, NY) are reportedly using this set of measures "as guidance for policy and as evaluation and decision aids." 584 The measure was developed to include all ages, but has since been adapted by Gadomski 277 and McConnochie 287 as a pediatric measure. South Carolina included gastroenteritis in the set of ambulatory care sensitive measures for pediatrics used by the South Carolina Department of Health Statistics. 278

Empirical Evidence
TestStatisticRating
Precision
   Raw area level rate/standard deviation98.5, 10.1 
   Systematic area-level standard deviation*0.05%High
   Area variation as a percentage of total variation*0.03%Moderate
   Signal ratio*77.8%High
   R-Square*78.8%High
   * age- and gender- adjusted  
Minimum Bias - age-sex risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)5.5%Very Good
   Relative impact:
     Rank correlation0.995Very Good
     Percent remaining in high decile/low decile90.9% / 100%Very Good
     Percent changing more than 2 deciles0.00%Very Good
Precision

This indicator is precise, with a raw area level rate of 98.5 and a standard deviation of 10.1. The systematic area level standard deviation is high, at 0.05%. The area level variation accounts for a moderate percentage of total variation, at 0.03%. This means that relative to other indicators, a lower percentage of the variation occurs at the area level, rather than the discharge level. The signal ratio is high, at 77.8%. This means that it is likely that the observed differences in area performance represent true variation in performance, though some may also reflect random noise. The high R-square reflects the high proportion of signal that can be extracted using multivariate techniques, though multivariate techniques do not have substantial additional impact.

Bias

Signal variance does not change with risk adjustment. The rank correlation is very good at 0.995. Risk adjustment does not appear to impact extremes of the distribution substantially, as 90.9% of areas in the highest decile and 100% of the lowest decile remain after risk adjustment. No areas move more than two deciles in relative performance. The absolute impact is also minimal, with an average change in performance relative to the mean of 5.5%.

Construct validity

Pediatric gastroenteritis is related to most other ACSC indicators.

Discussion

Pediatric gastroenteritis can be treated on an outpatient basis. Guidelines for this condition have been established, yet are not widely adhered to. However, there is little compelling evidence that adherence to these guidelines reduces admission rates. In fact, many of the admissions appear to be discretionary and possibly inappropriate admissions. Our literature review noted that 22% of variance in admission rates is explained by household income. Areas may wish to examine several factors when interpreting the results of this indicator. Admissions may be precipitated by poor quality care, lack of compliance with care, or poor access to care or may be due to environmental causes. Areas may wish to examine the causes of admissions through means such as chart review. The appropriateness of admissions may also be examined, to ascertain whether the admission threshold is lower in one area than another. Examination of processes of care in outpatient settings may also illuminate the extent to which gastroenteritis rates are due to poor quality care.

This indicator is measured with precision, and the signal ratio is high. This suggests that the observed variance is likely to reflect true differences in provider performance. Multivariate techniques do not appear to improve substantially the amount of signal that can be extracted, and thus either univariate or multivariate smoothing is recommended.

We did not identify substantial bias in our empirical analyses. Our literature review identified socioeconomic status to be a large factor in admission rates. Parental compliance, and increases in discretionary admissions with low parental coping resources may also influence admission rates. Most of these factors could not be identified using administrative data, and may vary systematically by area. Areas with high rates may want to identify disease severity by looking at the degree of dehydration of patients and comorbidities to establish whether or not admissions are discretionary, appropriate or due to poor quality care.

Pediatric gastroenteritis is an avoidable hospitalization/ ambulatory care sensitive condition indicator. These indicators are not measures of hospital quality, but rather measures of outpatient and other healthcare. These measures would be of most interest to comprehensive health care delivery systems, such as some health maintenance organizations, or public health officials. ACSC indicators are correlated with each other and may be used in conjunction as an overall examination of outpatient care.

Areas may wish to identify hospitals that contribute the most to the overall area rate for this indicator. The patient populations served by these hospitals may be a starting point for interventions.

Admission for pediatric gastroenteritis, like many ACSC conditions, typically varies with socioeconomic status. Examination of the SES status of an area's population, using estimates such as patient zip code or insurance status, may explain some of the area variation. However, SES is complexly related to poor access to care, so an area should not assume that none of the variance associated with SES is associated with poor access to care.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, though it is recommended that it be used in conjunction with other ACSC indicators.. It received an empirical rating of 17 out of 26. This indicator is recommended with several caveats of use. As an ACSC indicator, this indicator may be viewed as a proxy for actual quality problems. This indicator has unclear construct validity, as this indicator has not been validated except as part of a set of indicators. Further, it is possible that providers may reduce admission rates without actually improving quality, by shifting care to an outpatient setting. Caution should be maintained for admission rates that are drastically below or above the average or recommended rates.

3.E.5. In-Hospital Mortality Measures

INDICATOR 33: ACUTE MYOCARDIAL INFARCTION (AMI) MORTALITY RATE

IndicatorProvider level mortality rate for AMI.
Relationship to QualityBetter processes of care may reduce mortality for AMI. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with diagnosis code for AMI.
Outcome of InterestNumber of deaths with diagnosis code for AMI (see Appendix 6) in any field.
Population at RiskAll discharges with diagnosis codes for AMI in any field. (see Appendix 6)

Age 18 years and older.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Acute myocardial infarction (AMI) affects 1.5 million people each year and approximately one-third die in the acute phase of the heart attack. 658 Many clinical and observational studies have been conducted showing processes of care linked to survival improvements. These research findings have resulted in detailed practice guidelines covering all phases of AMI management. 486 Starting in 1992, the Health Care Financing Administration implemented a national initiative to gather data for quality improvement. The project, "the Cooperative Cardiovascular Project (CCP)," focuses on improving treatment of AMI patients.

Precision

The precision of AMI mortality rate estimates may be problematic for medium and small hospitals. About 13% of AMI patients in the California Hospital Outcomes Project died during hospitalization, or within 30 days of admission. 659 Since 19.5% of AMI patients were transferred from their initial hospital to another acute care facility, the percentage of deaths in an unlinked episode of care would have been somewhat less. In a study using the 1992 MedisGroups Comparative Database with 100 hospitals, mostly in Pennsylvania and the southern United States, the in-hospital mortality rate was 13.2% for AMI patients. In elderly Medicare AMI patients, 30-day mortality rates varied from 18% in Connecticut to 23% in Alabama. 660 Although these outcome rates are high, the number of AMI patients varies widely across hospitals, based on the size and risk profile of each hospital's catchment area.

Minimum bias

Starting in 1990, ICD-9-CM included a fifth digit for AMI codes to distinguish treatment during the "initial episode of care" from subsequent treatment related to the same AMI (within 8 weeks of the event). In studies comparing chart and administrative data since this time, the agreement in identification of new AMI cases has been shown to be at least 93%, and as high as 98%.659, 661 The California Hospital Outcomes Project found that "unlikely" AMI patients had significantly higher mortality than patients with definite or possible AMI. 659 However, there was no evidence of systematic bias across hospitals; high-mortality and low-mortality hospitals had similar proportions of "unlikely" AMI patients.

About 19.5% of AMI patients were transferred from the initial hospital to another acute care facility in the California Hospital Outcomes Project, and these transfer rates varied across hospitals. 659 In studies using unlinked data, hospitals transferring a large proportion of their AMI patients may have lower death rates than hospitals that do not regularly transfer patients. A related bias results from the fact that many deaths related to AMI occur after hospital discharge, but within 30-days. 662 Thus, as described below (under "Fosters True Quality Improvement"), hospitals with long mean LOS may appear to have higher mortality rates than hospitals with shorter mean LOS. Investigating hospital LOS and transfer rates in conjunction with AMI mortality may help resolve these concerns.

Risk adjustment

Numerous studies have established the importance of risk adjustment for AMI patients. As a result, researchers have developed a number of risk adjustment models. Normand et al developed and validated two models, one of which was based on conditions likely to be present on admission and therefore applicable to comparisons of hospital-based care. 84 The claims-based model included 25 comorbidities not related to treatment. Hypertension (18.3%), diabetes (13.8%), and pulmonary disease (11.2%) were the most frequent comorbidities in an AMI Medicare cohort of 164,427 patients. Examples of frequent comorbidities that were considered possibly related to hospital treatment, and therefore omitted from their model, included congestive heart failure (33.9%), chronic angina (27.4%), and arrhythmias (25.2%). The same team developed another model using the clinical predictors available from the Cooperative Cardiovascular Project. 84 From these and numerous other studies, the most important predictors of short-term AMI mortality have been shown to include age, previous AMI, tachycardia, pulmonary edema and other signs of congestive heart failure, hypotension and cardiogenic shock, anterior wall and Q-wave infarction, cardiac arrest, and serum creatinine or urea nitrogen. Fewer studies have addressed whether adjusting for potential complications as well as comorbidities, or adjusting only for predictors available from administrative data, leads to bias in comparisons across hospitals.

Krumholz et al compared seven models including a newly developed 7-variable clinical/demographic risk adjustment model for 30-day mortality in AMI patients. 123 The models based on clinical data demonstrated better discrimination and calibration than two models 660 based on ICD-9-CM codes (area under the receiver operating curve 0.74-0.78 versus 0.70-0.71, respectively). In addition, the clinical models classified hospital performance somewhat differently than the models based on administrative data. Such differences were further explored by Iezzoni and colleagues, who used several proprietary products to estimate risk-adjusted AMI mortality, and found 40-60% disagreement in identifying the 10 best and 10 worst hospitals in a nationwide sample.9, 204 Adding full clinical data to administrative data for risk-adjustment, Pine found that 73% of Cleveland hospitals' expected mortality rates changed by less than one standard deviation, and 100% changed by less than two. 19 In St. Louis, 95% of hospitals' expected mortality rates changed by less than 0.5 standard deviations, and 100% changed by less than one. These estimates were better than those for other major medical conditions, including pneumonia, stroke, and congestive heart failure. 205 In the California Hospital Outcomes Project, the addition of clinical risk factors to a reestimated model based on reabstracted ICD-9-CM codes had a minimal effect on the difference in risk-adjusted mortality between low-mortality and high-mortality hospitals, although individual hospitals were affected. 659 In summary, these studies found that the method of risk-adjustment does affect which specific hospitals are identified as mortality outliers, but that the correlations within pairs of risk-adjusted or expected mortality rates are generally high (e.g., 0>0.80) 12 to 0.94 205 , and higher for AMI than for other medical conditions.

When risk adjustment models include ICD-9-CM conditions that may represent consequences of poor care, then discrimination is exaggerated. 123 Romano and Chan compared an administrative data set to a re-abstraction of diagnoses present at admission, with two versions of the All Patient Refined-Diagnosis-Related Groups (APR-DRG), Risk of Mortality (ROM) and Severity of Illness (SOI). 113 The authors showed empirically that APR-DRGs predicted 30-day mortality better when all diagnoses were included than when only diagnoses present at admission were included. Hospitals' expected mortality rates based on all reabstracted ICD-9-CM codes were moderately correlated (r=0.72-0.77) with expected mortality rates based only on diagnoses present at admission. However, 2 of the 3 hospitals classified as having higher than expected mortality, 8 of the 23 hospitals classified as having neither higher nor lower than expected mortality, and 0 of the 4 hospitals classified as having lower than expected mortality, switched categories when diagnoses not present at admission were excluded from risk-adjustment.

Construct validity

Numerous randomized controlled trials have conclusively demonstrated that early administration of aspirin and thrombolytic agents can reduce AMI mortality.663-667 Similarly, early revascularization by percutaneous coronary angioplasty reduces mortality in high-risk patients.668-670 Angiotensin converting enzyme inhibitors reduce mortality among post-infarction patients with impaired left ventricular function.671-673 Therefore, there is clear evidence at the patient level that specific processes of care improve patient outcomes. Furthermore, numerous studies based on large regional or national samples have shown substantial practice variation in AMI patients, with underutilization of clearly beneficial therapies and overutilization of harmful treatments.95, 674, 675

Over the last several years, substantial evidence for construct validity at the hospital level has emerged. In the first study of this type, Park et al. estimated the contribution of differences in severity of illness and quality of care to the classification of some hospitals as having unexpectedly high inpatient death rates (age and gender adjusted). 38 Not unexpectedly, severity of illness (using chart data) accounted for some of the variation. However, a quality score derived from an explicit set of process measures did not explain differences between low-mortality and high-mortality hospitals. In fact, the relationship was in the opposite direction from the authors' expectation under several analysis scenarios.

More favorable evidence came from Meehan and colleagues, who evaluated coding accuracy, severity of illness, and process-based quality of care in Connecticut hospitals. 661 Three process measures were selected by an expert panel based on medical literature and local practice patterns: 1) administration of thrombolytic therapy, 2) discharged on aspirin if no contraindication, and 3) discharged on a beta blocker if no contraindication. The hospitals with the highest risk-adjusted mortality had significantly lower utilization of beneficial therapies than the other hospitals in the sample. Although the Medicare Prospective Payment System Quality of Care study did not focus on specific therapeutic interventions, it also demonstrated significantly higher risk-adjusted mortality rates (using risk factors derived by chart review) among hospitals with "poor" processes of care than among hospitals with "good" or "medium" processes of care (30.1% versus 22.0% and 23.9%, respectively). 676 Chen 146 showed that the hospitals designated by US News and World Report as "America's Best Hospitals" in cardiology, based on risk-adjusted mortality (using APR-DRGs) and reputation among physicians, had lower risk-adjusted mortality (using clinical predictors) among Medicare patients (15.6% versus 18.3-18.6%) and used aspirin and beta blockers more often than hospitals that were not so designated. Similarly, major teaching hospitals in the same Medicare data set had 20% lower risk-adjusted 30-day mortality than nonteaching hospitals; about half of this difference was attributable to greater use of beneficial therapies. 677 In the RAND PPS Quality of Care study in 1990, patients with higher process scale scores for AMI demonstrated significantly lower risk-adjusted 30-day AMI mortality on four out of five subscales and on an overall process scale. 676 In the California Hospital Outcomes Project, hospitals with low risk-adjusted AMI mortality were more likely to give aspirin within 6 hours of arrival in the emergency room, more likely to perform cardiac catheterization and revascularization procedures within 24 hours, and more likely to give heparin to prevent thromboembolic complications. However, there were no differences between low and high-mortality hospitals in the use or timing of thrombolytic or beta blocker therapy. 659

These somewhat conflicting findings may relate to the general insensitivity of mortality rates to process measures. Mant and Hicks conducted a systematic review of the literature to estimate the effect sizes for therapies proven effective for AMI patients, based on clinical trials and meta-analyses. 43 The therapies assessed were beta blockade, aspirin, fibrinolysis, and angiotensin converting enzyme inhibitors. Using the best estimates of effect size and the proportion of patients eligible for treatment, the authors calculated the absolute risk reduction for low and high baseline mortality situations, with a resulting range of 5.1% to 16.4%. Given this range, they simulated the number of patients required to detect differences in care using either a "perfect system" for risk-adjusted mortality or a process-based quality of care audit. Using the same population of AMI patients, the difference in lives lost was detectable with one year of data collection on mortality or only two weeks of data collection on process of care.

The widespread recognition of the exceptionally strong evidence base supporting specific processes of care for AMI patients has led to numerous professional guidelines, guideline implementation projects,678, 679 and regional and national quality improvement initiatives. Through its Cooperative Cardiovascular Project and Sixth Scope of Work, the Health Care Financing Administration has focused considerable attention on improving processes of care for AMI, as a way to improve mortality and other outcomes. Hospitals in the four pilot states involved in this project (AL, CT, IA, WI) significantly improved their performance on each process indicator between 1992 and 1995, and simultaneously achieved a greater reduction in 30-day mortality (19.9% to 17.6%) than hospitals in other states (19.6% to 18.2%). This finding suggests, but does not prove, that hospitals can lower their AMI mortality rates by improving adherence to evidence-based guidelines.

Fosters true quality improvement

In general, physicians and hospitals have little discretion in their decisions to admit AMI patients, so it seems unlikely that the use of this indicator would impede access to needed care. However, a few patients who fail to respond to, or are ineligible for, resuscitative efforts in the emergency room may not be admitted if there is pressure to reduce inpatient mortality. Although such practices might bias comparisons of risk-adjusted inpatient mortality across hospitals, they would be unlikely to compromise patient outcomes (as resuscitative measures that fail in an emergency room would also fail in a coronary care unit). It is conceivable that patients could be discharged early to die at home or in a nursing home, although this may be unlikely due to the acute nature of the condition. Patient transfers to other hospitals will also have a greater effect on inpatient mortality rates, as noted in the OSHPD study, because hospitals vary widely in their transfer rates. Typically, 30-day overall mortality rates and 30-day inpatient mortality rates have been considered more valid than inpatient mortality rates based only on the initial hospitalization for AMI. The rank correlation between standardized AMI mortality measures based on inpatient deaths and measures based on 30-day deaths (at the hospital level) was 0.79 in a study of Medicare data. 289 This finding suggests that changes in length of stay may modestly alter the ranking of hospital performance using this measure.

Prior use

Inpatient AMI mortality, based on administrative data, has recently been used as a hospital quality indicator by the University Hospital Consortium, 370 the California Hospital Outcomes Project, 680 HealthGrades.com, 377 the Michigan Hospital Association (aggregated with congestive heart failure and angina), 373 and the Greater New York Hospital Association. 372 In addition, the following organizations have used this indicator with risk-adjustment based on clinical data obtained through review of medical records: the Pennsylvania Health Care Cost Containment Council 681 and Cleveland Health Quality Choice. 374 The Joint Commission for the Accreditation of HealthCare Organizations has adopted AMI mortality (from the MEDSTAT Corporation) as one of its core hospital performance measures. 443 AMI mortality is also a High-Level Performance Indicator for the United Kingdom's National Health Service. 590

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation24.4%, 16.1% 
   Systematic provider -level standard deviation**3.4%High
   Provider variation as a percentage of total variation**0.8%High
   Signal ratio**42.8%Moderate
   R-Square**59.0%Moderate
   **APR-DRG, age-, gender- adjusted  
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)32.4%Fair
   Relative impact:
     Rank correlation0.747Fair
     Percent remaining in high decile/low decile36.3% / 67.3%Fair
     Percent changing more than 2 deciles29.0%Fair
Precision

This indicator is precise, with a raw provider level mean of 24.4% and a standard deviation of 16.1%. The systematic provider level standard deviation is high, at 3.4%. The provider level variation also accounts for a high percentage of total variation, at 0.8%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level, although more of the variation occurs at the discharge level than some indicators. The signal ratio is only moderate, at 42.8%. This means that it is likely that the some of the observed differences in provider performance do not represent true differences in provider performance. The moderate R-square (59%) reflects the higher proportion of signal that can be extracted using multivariate techniques.

Bias

Signal variance decreases by over 25% with risk adjustment, indicating that some of the true variation among providers reflects differences in patient characteristics. The indicator performs fairly on the multiple measures of minimum bias. The rank correlation is fair at 0.747. The impact on the extremes is large. Only 36.3% of providers in the highest decile remain, and only 67.3% in the lowest decile remain, after risk adjustment. Similarly, the number of providers moving at least two deciles in relative rank is also high. The absolute magnitude of risk adjustment is also substantial.

Construct validity

AMI mortality does not load substantially on any of the three extracted factors. However, AMI mortality is correlated with several other indicators, including Bi-lateral catheterization (r=-.16, p<.0001), mortality for CHF (r=.46, p<.0001), pneumonia (r=.46, p<.0001), CABG (r=.50, p<.0001), stroke (r=.40, p<.0001), and GI hemorrhage (r=.38, p<.0001).

Discussion

Reductions in the mortality rate for acute myocardial infarction on both the patient level and the provider level has been related to better processes of care. Timely and effective treatments are essential for patient outcome, and include appropriate use of thrombolytic therapy and when appropriate, revascularization. The evidence surrounding the validity of AMI mortality as a quality indicator is substantial.

AMI mortality rate is measured with adequate precision, with high systematic variation, and a moderate signal ratio. This suggests that some of the observed variance may not actually reflect true differences in performance. Multivariate techniques help in extracting additional signal and are recommended. Using smoothed estimates (MSX) may help to avoid precision problems due to random noise.

Risk adjustment may be important for this indicator. Our empirical results show substantial impact of risk adjustment, particularly at the extremes. This means that without risk adjustment, some providers may be mislabeled as outliers. In addition, some of the potential risk adjustment factors, such as clinical measures, may not be available using administrative data. Methods such as chart review may help illuminate the need for more detailed risk adjustment, and potential case-mix differences between providers. Since AMI is an urgent medical condition, this indicator is not expected to be subject to selection bias.

Hospital discharge practices differ, with some hospitals discharging patients earlier than others. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator).

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 5 out of 26, and smoothing is highly recommended. This indicator is recommended with two major caveats of use. Thirty-day mortality may be significantly different than in-hospital mortality, leading to information bias. Second, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator.

INDICATOR 34: CONGESTIVE HEART FAILURE (CHF) MORTALITY RATE

IndicatorProvider level mortality rate for CHF.
Relationship to QualityBetter processes of care may reduce short-term mortality for CHF. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with principal diagnosis code of CHF.
Outcome of InterestNumber of deaths with principal diagnosis code for CHF (see Appendix 6).
Population at RiskAll discharges with principal diagnostic code of CHF (see Appendix 6).

Age 18 and older.

Exclude discharges with cardiac procedure codes in any field (see Appendix 6).
Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Admission for heart failure is common with 400,000 new cases 682 and 274,000 deaths 683 each year in the United States. Approximately 2 million persons in the U.S. have heart failure and this number will increase as the population ages. 682 Population data from the United States have not indicated a decline in mortality over the last 20 years. 684 However, a recent study from Scotland has suggested that case-fatality mortality rates have improved between 1986 and 1995. 685

The accuracy of ICD-9 coding for heart failure has been questioned. Although the specificity of a principal diagnosis of heart failure is high (>95) the sensitivity is low. 686 Even when the principal or secondary diagnoses are used the sensitivity is only 63%, and the positive predictive value is 83.5% 686 Others have found lower positive predictive values (62.5%) but higher sensitivities (89.9%) for the combined use of principal and secondary diagnoses. 687 Face validity will be maximized by limiting analyses to patients with a principal diagnosis of heart failure.

Precision

Rates of short-term mortality vary from 6% 688 to 13%, 689 14%, 690 19.9%. 685

In-hospital survival appears to have improved recently. The 3-day mortality rate decreased by 41% in a study of 29,500 elderly patients in Oregon from 1991 to 1995. 691 These data suggest that hospitals have improved care for heart failure patients.

There is substantial variation in hospital survival. In an analysis of 6 hospitals the survival ranged from 3.6% to 11.3%; however, no hospital had a statistically significantly higher rate. 692

Minimum bias

Mortality from heart failure is greatly influenced by patient characteristics. Mortality has been reported to be higher in older patients, males and 693 and possibly whites. 694 Using administrative data, the c-statistic was 0.68 for logistic models of in-hospital survival using the Charlson comorbidity (non-specific) score and 0.78 for a heart failure specific score. 695 The variables associated with increased in-hospital mortality in the heart failure specific score were age (relative weight 1), transfer, 683 cerebrovascular disease, 683 chronic obstructive pulmonary disease, 683 hyponatremia, 683 other hydro-electrolytic disturbance, 683 metastatic disease, 683 moderate to severe renal disease, 685 ventricular arrhythmia, 687 mild liver disease, 687 malignancy, 687 hypotension and shock.694, 695

Another multivariable model of in-hospital mortality using clinical data (blood pressure, electrocardiogram) demonstrated a c-statistic of 0.9. 692 When six hospitals were evaluated using this clinical model, expected mortality varied from 4% to 9% with one hospital having a significantly lower predicted rate (P = .01). The observed-expected mortality ranged from -3.8% to +4.7% (all NS).

Construct validity

There were no studies that specifically examined the construct validity of in-hospital mortality from heart failure. We did identify several studies that provide information on the validity of this indicator. On a patient level processes of care have been shown to decrease mortality; it is unknown how implementing these processes of care would actually affect provider-level mortality rates.

Survival is known to be improved for patients with heart failure and low ejection fraction if they receive ACE inhibitors. 696 There is a wide variation in the use of ACE inhibitors during hospitalization. 697 A measure of left ventricular ejection fraction (e.g. echocardiography) is recommended to determine which patients have depressed ventricular function and would benefit from life-prolonging medical therapy. Measures of left ventricular function have also been found to vary widely within hospitals. 697 Use of echocardiography has been associated with more ACE inhibitor use and improved survival. 698 In their PPS Quality of Care study in 1990, RAND reported better process of care for CHF on all five subscales and overall. On four of five process subscales and on the overall scale, patients demonstrated significantly lower risk-adjusted CHF mortality. For patients with a poor overall process scale score (in the lowest 25%), 30-day risk-adjusted mortality was 19% while that for patients with medium process scale scores was 13% and for patients with high scale scores (in the highest 25%) 30-day risk-adjusted mortality was 11%. 676

Fosters true quality improvement

Risk adjusted measures of mortality may lead to an increase in coding of comorbidities. Patients may be discharged early to a lower level of care (nursing home) so that the death does not occur in-hospital. All in-hospital mortality measures may create perverse incentives to reduce hospital mortality by discharging patients earlier, and thereby shifting deaths to skilled nursing facilities or outpatient settings. This phenomenon may also lead to biased comparisons among hospitals with different mean lengths of stay. The rank correlation between standardized mortality measures based on inpatient deaths and measures based on 30-day deaths (at the hospital level) was 0.71 and 0.78 in studies of Medicare 289 and all-payer Cleveland 699 data, respectively. This finding suggests that changes in length of stay may modestly alter the ranking of hospital performance using this measure. However, Rosenthal et al. found noevidence that hospitals with lower in-hospital standardized mortality had higher (or lower) early post-discharge mortality.

Prior use

Mortality for congestive heart failure has been widely used as a quality indicator. In addition to the use in the literature listed above, HealthGrades.com, 377 the University Hospital Consortium 370 and the Greater New York Hospital Association 372 have used this measure. The Maryland Hospital Association include this measure in their Maryland QI Project Indicator set. 369 Cleveland Health Quality Choice 374 includes CHF mortality in their cardiovascular care measure. Likewise, Michigan Hospital Association 373 includes CHF in an aggregated mortality measure.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation7.5%, 9.5% 
   Systematic provider -level standard deviation**2.1%Moderate
   Provider variation as a percentage of total variation**0.7%Moderate
   Signal ratio**53.5%Moderate
   R-Square**69.7%Moderate
   **APR-DRG, age-, gender- adjusted  
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)13.7%Good
   Relative impact:
     Rank correlation.794Good
     Percent remaining in high decile/low decile39.8% / 72.9%Fair
     Percent changing more than 2 deciles23.5%Fair
Precision

This indicator is precise, with a raw provider level mean of 7.5% and a standard deviation of 9.5%. The systematic provider level standard deviation is moderate, at 2.1%. The provider level variation also accounts for a moderate percentage of total variation, at 0.7%. This means that relative to other indicators, a lower percentage of the variation occurs at the provider level, rather than the discharge level. The signal ratio is only moderate, at 53.5%. This means that it is likely that the some of the observed differences in provider performance do not represent true differences in provider performance. The moderate R-square (69.7%) reflects the higher proportion of signal that can be extracted using multivariate techniques.

Bias

Signal variance does not change with risk adjustment. The indicator performs fairly to well on the multiple measures of minimum bias. The rank correlation is good at 0.794. The impact on the extremes is large. Only 39.8% of providers in the highest decile remain, and only 72.9% in the lowest decile remain, after risk adjustment. Similarly, the number of providers moving at least two deciles in relative rank is also high. The absolute magnitude of risk adjustment is moderate.

Construct validity

Congestive heart failure mortality loads on factor 1. It is positively related to other medical mortality indicators, such as pneumonia, GI hemorrhage, and stroke, and to a lesser extent, hip fracture.

Discussion

Congestive heart failure is a progressive chronic disease. Short-term mortality is substantial and varies from provider to provider. Certain treatments have been shown to decrease short-term mortality on a patient level, but it is unknown whether or not such practices decrease provider-level mortality.

CHF is a relatively common admission, with a relatively high short-term mortality rate. Our empirical tests showed that this indicator is precise, with moderate systematic variation, and a moderate percentage of that variation is provider-level. The signal ratio is moderate, suggesting that the some of the differences observed do not reflect true differences in performance. Multivariate techniques improve the amount of signal that can be extracted for this indicator and as such MSX smoothed estimates are recommended to avoid problems with precision due to random noise.

While CHF mortality has not been studied extensively as an indicator, some have developed risk models for short-term death. Comorbidities and some clinical factors appear important in predicting death. Our empirical analysis confirmed that risk adjustment impacts this indicator. Relative performance is the most impacted, particularly for providers with high and low rates. Therefore, it is important to use some sort of risk adjustment for this indicator, as otherwise, some providers may be mislabeled as outliers. Another source of bias is the outpatient management of some patients. Some providers may admit only the most severely ill patients and handle other patients in an outpatient setting, while others do not do this. This results in a more severe casemix. Providers may wish to examine rates of outpatient care.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 6 out of 26, and smoothing is highly recommended. This indicator is recommended with three major caveats of use. First, some CHF care occurs in an outpatient setting, and selection bias may be a problem for this indicator. Second, thirty-day mortality may be significantly different than in-hospital mortality, leading to information bias. Third, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator.

INDICATOR 35: GASTROINTESTINAL (GI) HEMORRHAGE MORTALITY RATE

IndicatorProvider level mortality rate for GI hemorrhage.
Relationship to QualityBetter processes of care may reduce mortality for GI hemorrhage. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with principal diagnosis code of GI hemorrhage.
Outcome of InterestNumber of deaths with principal diagnosis code for GI hemorrhage (see Appendix 6).
Population at RiskAll discharges with gastrointestinal hemorrhage in principal diagnosis field (see Appendix 6).

Age 18 years or older.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates
Evidence from the literature
Face validity

Admission for gastrointestinal hemorrhage is fairly common (100/100,000 adults). Mortality rates for hemorrhage vary greatly, and lower mortality has been associated with more use of treatments such as early endoscopy (within 24-48 hours of presentation), though the strength of this relationship has not been established, with some studies failing to find significant relationships (see construct validity section). Mortality rates in large population based databases have not changed since the 1940s, though, there have been increases in the ages and comorbidities of patients. 700

Precision

Rates of mortality in gastrointestinal hemorrhage vary from 0-29%, with most studies reporting rates of 3.5%-11%.700-704

Minimum bias

Mortality from gastrointestinal hemorrhage is highly influenced by patient comorbidities and other factors complicating the bleed, as well as the nature and severity of the bleed itself, which all vary substantially across patients with the condition. One study noted that some endoscopic findings, hemodynamic characteristics, and comorbidities were highly predictive of life-threatening events. 705 The same study found at reassessment that the strongest predictors of life-threatening events were reoccurring bleeding (3-6 times risk of mortality) and unstable comorbid diseases. Mortality rates in patients with gastrointestinal hemorrhage have been shown to increase with age. In one study, the overall mortality, with the absence of comorbidity, was 4%, while patients under the age of 60 years experienced a mortality of 0.1%. Concurrent malignancy and organ failure increased the mortality rate for patients under 60 years to 0.8%. 704

Patients who develop bleeding in-hospital have higher mortality rates than patients admitted with gastrointestinal bleeding that began outside the hospital. One study reported a difference in crude mortality rates of 33% versus 11%.702, 704

One study tested the effect of risk-adjustment on hospital ranking for gastrointestinal hemorrhage mortality. Risk-adjusting for age, shock, and comorbidity (characteristics that are often reported on discharge abstracts) changed 30 hospitals rankings by more than 10. Adding diagnosis, endoscopy findings, and rebleed status changed 32 hospital rankings by more than 10. 700

Construct validity

We located no studies explicitly evaluating the construct validity of this indicator. On a patient level processes of care have been shown to decrease mortality. However, it is unknown how implementing these processes of care would actually affect provider-level mortality rates.

A number of medical treatments have been shown to be associated with bleeding control, though evidence on association with mortality is more limited. Endoscopy has been shown inconsistently to be associated with mortality. One meta-analysis showed a slight advantage for early endoscopy, 706 while another study found that endoscopy was not related to mortality in either the bivariate or multivariate analyses. 701

Many of the deaths reported are not associated with bleeding per se. One study found that only one death was related to bleeding, and that patient had several severe comorbidities. 702 Thus, in many cases, the deaths in patients with a diagnosis of gastrointestinal hemorrhage are probably not actually due to the bleed itself.

Fosters true quality improvement

Risk-adjusted measures of mortality may lead to an increase of coding of comorbidities or upcoding of diagnoses. All in-hospital mortality measures may create perverse incentives to reduce hospital mortality by discharging patients earlier, and thereby shifting deaths to skilled nursing facilities or outpatient settings. This phenomenon may also lead to biased comparisons among hospitals with different mean lengths of stay. We found no published evidence about whether the difference between inpatient and 30-day mortality for in GI hemorrhage is substantial enough to cause concern.

Prior use

GI hemorrhage is currently used by the Cleveland Choice Health Quality Choice 369 and the Maryland Hospital Association (in the Maryland QI Project). 369 GI hemorrhage is also included in the Michigan Hospital Association's aggregated mortality measure. 373

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation4.6%, 5.7% 
   Systematic provider -level standard deviation**1.1%Moderate
   Provider variation as a percentage of total variation**0.3%Moderate
   Signal ratio**20.2%Low
   R-Square**55.5%Moderate
   **APR-DRG, age-, gender- adjusted  
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)10.5%Good
   Relative impact:
     Rank correlation0.803Good
     Percent remaining in high decile/low decile48.9% / 32.2%Fair
     Percent changing more than 2 deciles35.5%Fair
Precision

This indicator is precise, with a raw provider level mean of 4.6% and a standard deviation of 5.7%. The systematic provider level standard deviation is moderate, at 1.1%. The provider level variation also accounts for a moderate percentage of total variation, at 0.3%. This means that relative to other indicators, a lower percentage of the variation occurs at the provider level, rather than the discharge level. The signal ratio is low, at 20.2%. This means that it is very likely that the some of the observed differences in provider performance do not represent true differences in provider performance. The moderate R-square (55.5%) reflects the higher proportion of signal that can be extracted using multivariate techniques.

Bias

Signal variance decreases by more than 25% with risk adjustment, suggesting that some of the observed variance is due to differences in patient characteristics. The indicator performs fairly to well on the multiple measures of minimum bias. The rank correlation is good at 0.803. The impact on the extremes is large. Only 48.9% of providers in the highest decile remain, and only 32.2% in the lowest decile remain, after risk adjustment. Similarly, the number of providers moving at least two deciles in relative rank is also high. The absolute magnitude of risk adjustment is moderate.

Construct validity

GI hemorrhage mortality loads on factor 1. It is positively related to mortality indicators, such as pneumonia, stroke, and congestive heart failure, and to a lesser extent, hip fracture.

Discussion

GI Hemorrhage may lead to death when uncontrolled. However, our literature review noted that the bleed itself is rarely the cause of death, calling into question the face validity of this indicator as a measure of quality of care for hemorrhage. The ability to manage severely ill patients with comorbidities may influence the mortality rate for "GI hemorrhage," though we found no evidence of this hypothesis.

GI hemorrhage mortality rate is measured with adequate precision, with moderate provider systematic variation. Though mortality due to the bleed itself is low, mortality in patients with GI hemorrhage along with other comorbidities is high enough and varies enough to expect adequately precise measurement. The signal ratio for this indicator is low, suggesting that some of the observed differences likely do not reflect true differences in performance. However, multivariate techniques do improve the amount of signal that can be extracted, and are recommended. Using smoothed estimates (MSX) may help to avoid precision problems due to random noise.

The extreme influence of comorbidities on the survival rate of patients with GI hemorrhage, as well as the influence of age and timing of onset (pre or post hospitalization) raises questions about the potential bias for this indicator. While we found no published evidence that these factors vary systematically by provider it seems likely that hospitals may vary in their treatment of geriatric or severely ill patients. Our empirical analysis confirmed this potential bias, particularly when identifying overall changes in provider performance or the providers with the lowest and highest mortality rates. Providers should risk adjust for comorbidities. In addition, providers with high rates may want to examine their case-mix for higher complexity of cases (patients over 60, more comorbidities).

Hospital discharge practices differ, with some hospitals discharging patients earlier than others. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator).

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 5 out of 26, and smoothing is highly recommended. This indicator is recommended with two caveats of use. First, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the substantial confounding bias for this indicator. Second, limited evidence supports the construct validity of this indicator.

INDICATOR 36: HIP FRACTURE MORTALITY RATE

IndicatorProvider level mortality rate for hip fracture.
Relationship to QualityBetter processes of care may reduce mortality for hip fracture. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with principal diagnosis hip fracture.
Outcome of InterestNumber of deaths with principal diagnosis code for hip fracture (see Appendix 6).
Population at RiskAll discharges, with principal diagnosis code of hip fracture (see Appendix 6).

Age 18 or older.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Hip fractures are a common cause of morbidity and functional decline among elderly persons. In addition, hip fractures are associated with a significant increase in the subsequent risk of mortality, which persists for a minimum of 3 months among the oldest and most impaired individuals at baseline,707, 708 and perhaps up to several years among younger and less impaired individuals.709, 710 Fractures of the femoral neck or intertrochanteric region are usually caused by minimal trauma (e.g., fall on a step, on a level surface, or from a chair) in the setting of osteoporosis, a common condition characterized by demineralization and weakening of weight-bearing bones. About 89% of hip fracture patients are elderly, and they often have multiple comorbidities and pre-fracture functional impairments. As a result, they are at significant risk of such postoperative complications as pneumonia, myocardial ischemia, arrhythmias, and deep vein thrombosis. If these complications are not recognized and effectively treated, life-threatening problems such as respiratory failure, myocardial infarction, and pulmonary embolus may ensue.

Precision

Hip fracture is the most common type of fracture requiring hospitalization; about 382,000 discharges in 1998 listed a diagnosis of hip fracture, of which 329,000 listed it as the principal diagnosis (12.0 per 10,000 persons). 711 Based on the all-payer database in California, each hospital admitted an average of 51.6 elderly hip fracture patients requiring surgical repair per year in 1995-96. 712 The largest published study of in-hospital mortality reported a rate of 4.9% in 1979-88. 713 These data suggest that mortality rates are likely to be relatively reliable at the hospital level.

Minimum bias

There is relatively little potential for selection bias, because almost all patients with hip fracture are hospitalized. However, hip fracture patients may tend to be sicker at some hospitals than at others, due to variations in health status across the catchment areas of different facilities. The known predictors of in-hospital or 30-day mortality can be divided into several categories. Demographic predictors include age, male sex, and prior residence in a nursing home. Comorbidity predictors include malnutrition, venous disease, digestive diseases, cardiovascular diseases (including congestive heart failure, angina, and atrial fibrillation), neoplasms, disorientation or delirium, chronic obstructive pulmonary disease, prior hospitalization within one month, the Charlson comorbidity index (or the number of chronic medical conditions), and the ASA (American Society of Anesthesiology) physical status score. The only proven functional predictor of short-term mortality is dependency in any activity of daily living. Finally, fracture site may be a significant predictor, although probably more for long-term outcomes than for short-term outcomes. To the extent that these factors are more prevalent at some hospitals than at others, hip fracture mortality rates may be susceptible to confounding bias. In the absence of studies explicitly comparing models with and without clinical data elements, it is difficult to assess whether administrative data contain sufficient information to remove bias.

Construct validity

There is conflicting evidence on the construct validity of this indicator. The association between risk-adjusted mortality (using clinical data obtained by chart abstraction) and implicit and explicit process criteria was explored as part of RAND's Prospective Payment System Quality of Care study. Whereas Medicare patients with poor "process of care" had higher risk-adjusted 30-day mortality than those with good "process of care" for four medical conditions, there was no difference for hip fracture (4.6% versus 5.1%, RR=0.90). None of the process subscales (physician cognitive, nurse cognitive, technical diagnostic, technical therapeutic, monitoring) was associated with risk-adjusted 30-day mortality. 676 A more recent British study identified one East Anglican hospital with significantly lower than average risk-adjusted mortality (at 90 days); patients at this hospital "were routinely treated by a designated multidisciplinary team for fracture of the hip, with early assessment and surgery, much of which was performed by one surgeon," routine thromboembolism prophylaxis, and early mobilization. 714

There is very little evidence supporting an association between hospital volume and mortality following hip fracture repair. (Following Halm, Lee, and Chassin 80 , we did not find this evidence to be sufficiently strong to recommend total hip fracture volume as a separate volume indicator.) Using administrative data from Florida, without any risk-adjustment, Lavernia 715 found no association between surgeon volume and in-hospital mortality. They did not report the effect of hospital volume, if any. A study of Medicare data from 1979 and 1980 showed no association with 60-day mortality, after adjusting for age, sex, and region, 716 although a more recent study of 1988 and 1990 Medicare data found higher than expected in-hospital mortality at low-volume hospitals. 717 Maerki 718 and Luft 239 found "no clear pattern" in the relationship between hospital volume and hip fracture mortality, using 1972 data from the Commission on Professional and Hospital Activities (CPHA), although there was a suggested effect among low risk patients.719, 720 Hughes and colleagues 52 used 1982 data from CPHA and did find a significant association between volume and inpatient mortality in a hospital-level regression analysis. These inconsistencies provide limited support for the construct validity of mortality as a quality indicator.

There is also substantial evidence that at least two major causes of death among hip fracture patients are partially preventable. Perez et al. 721 reviewed 581 autopsy reports on hip fracture patients who died in a single British hospital between 1953 and 1992, and reported that 80 (14%) and 55 (9%) deaths were attributable to pulmonary emboli and acute myocardial infarction, respectively. The most common cause of death after hip fracture was bronchopneumonia (46%); no medical intervention has been shown to reduce the incidence of, or mortality due to, this complication. Thromboembolic prophylaxis using either unfractionated or low molecular weight heparin reduces the incidence of radiographically documented deep vein thrombosis from 39% to 24% (OR=0.41, 95% CI=0.31-0.55), whereas physical devices reduce the incidence from 19% to 6% (OR=0.24, 95% CI=0.13-0.44). 722 Although meta-analysis suggests a reduction infatal pulmonary emboli with heparin prophylaxis (OR=0.39, 95% CI 0.14-1.09), there are insufficient data to draw firm conclusions regarding symptomatic emboli. A recent controlled trial suggests that aspirin reduces the incidence of symptomatic deep vein thrombosis (RR=0.71, 95% CI=0.52-0.97), symptomatic pulmonary emboli (RR=0.57, 95% CI=0.40-0.82), and fatal pulmonary emboli (RR=0.42, 95% CI=0.24-0.73) after hip fracture. 723 These experimental data are supported by population-based observational data from at least two areas.714, 724 One randomized controlled trial that included high-risk patients undergoing noncardiac surgery suggested that perioperative use of beta blockers may reduce the incidence of postoperative AMI. 725 A recent meta-analysis reported a nonstatistically significant reduction in AMI (RR=0.70, 95% CI=0.64-3.57) and a marginally significant reduction in total mortality at one month (RR=0.72, 95% CI=0.51-1.00) with regional anesthesia.726, 727 Nutritional supplementation may be a useful strategy to prevent postoperative complications, but no effect on mortality has been demonstrated. 728 Finally, several aspects of surgical technique may be associated with higher short-term mortality, including the use of hemiarthroplasty instead of internal fixation for displaced femoral neck fractures,729-732 the use of cement 731 or a posterior approach 732 in hemiarthroplasty for such fractures, and delayed surgical fixation.729, 730, 733-735 However, these findings come from observational studies that are susceptible to bias. The deleterious effect of hemiarthroplasty disappears with longer follow-up.732, 736 Nonetheless, these studies give providers with high hip fracture mortality rates some ideas for investigation or intervention.

Fosters true quality improvement

One possible adverse effect of in-hospital mortality measures is to encourage earlier postoperative discharge. This is a definite concern, for several reasons. First, 30-day mortality for hip fracture is substantially higher than in-hospital mortality (4.6-6.8% versus 3.3-4.9% in the largest published studies), suggesting that a relatively modest decrease in mean length of stay could significantly decrease inpatient mortality. Second, mean length of stay decreased more dramatically for hip fracture (i.e., by 5.6 days) than for any of the other four conditions studied after Medicare introduced prospective payment. 737 This decrease was associated with a disproportionate decrease in risk-adjusted in-hospital mortality (from 5.7% to 3.3%, versus a decrease in 30-day mortality from 5.3% to 4.6%) and an increase in instability at discharge (from 18.8% to 23.1%), which was correlated with higher post-discharge mortality. 738 There is conflicting evidence about whether the decrease in mean LOS has led to more prolonged nursing home stays.737, 739-742 Another potential response would be to avoid operating on high-risk patients, although there is such consensus in favor of surgical management of hip fracture that avoidance of high-risk patients seems unlikely.

Prior use

In-hospital mortality following hip fracture repair has not been widely used as a quality indicator, although it is subsumed within a University Hospital Consortium indicator (mortality for DRG 209). 370 The United Kingdom National Health Service also uses hip fracture mortality as a High Level Performance Indicator. 590

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation14.4%, 16.0% 
   Systematic provider -level standard deviation**7.8%High
   Provider variation as a percentage of total variation**6.0%Very High
   Signal ratio**54.3%Moderate
   R-Square**65.0%Moderate
   **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentNo changeGood
   Absolute impact:
     Average absolute change (in %)22.5%Fair
   Relative impact:
     Rank correlation0.880Good
     Percent remaining in high decile/low decile55.6% / 78.8%Good/ Fair
     Percent changing more than 2 deciles15.1%Fair
Precision

This indicator is precise, with a raw provider level mean of 14.4% and a standard deviation of 16.0%. The systematic provider level standard deviation is high, at 7.8%. The provider level variation also accounts for a very high percentage of total variation, at 6.0%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level. The signal ratio is only moderate, at 54.3%. This means that it is likely that the some of the observed differences in provider performance do not represent true differences in provider performance. The moderate R-square (65%) reflects the higher proportion of signal that can be extracted using multivariate techniques.

Bias

Signal variance does not change with risk adjustment. The indicator performs well to fairly on the multiple measures of minimum bias. The rank correlation is moderate at 0.880, and risk adjustment does seem to impact disproportionately the lowest decile relative to the highest decile, compared to other indicators. Only 78.8% of providers in the low decile without risk adjustment remain after risk adjustment. Similarly, 15.1% of providers change more than 2 deciles in performance. The absolute magnitude of the impact of risk adjustment is substantial.

Construct validity

Hip fracture mortality loads on factor 1. It is positively related to other medical mortality indicators, including pneumonia, stroke, GI hemorrhage, and CHF mortality.

Discussion

Hip fracture occurs in frequently in the elderly population. Complications of fracture and treatments sometimes include embolism, pneumonia, and myocardial ischemia. These conditions and other comorbidities lead to a relatively high mortality rate, and there is some evidence that some of these complications are preventable.

Such high mortality rates suggest that this indicator will be measured with good precision. Our empirical analyses found that this indicator is very precise, with high systematic variation. However, the signal ratio is moderate, suggesting that some of the observed variance does not reflect true differences in performance. Multivariate techniques improve the ability to extract signal for this indicator, and as such smoothing using MSX or other methods is advised.

Though all patients with hip fracture are hospitalized, some specialty centers may admit more clinically severe or frail patients. Patient age, sex, comorbidities, fracture site, and functional status are all predictors of functional impairment, though there is little evidence that these vary systematically by hospital. Our empirical analyses confirmed that this indicator has some potential bias. The impact of APR-DRG risk adjustment is moderate to substantial for both absolute and relative performance. Risk adjustment with age and sex and APR-DRGs is highly recommended. Further, detailed chart reviews may identify differences in functional status or other clinical factors not accounted for in discharge data.

Hospital discharge practices differ, with some hospitals discharging patients earlier than others, and some transferring patients to rehabilitation or sub-acute care facilities. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator).

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 10 out of 26, and smoothing is highly recommended. This indicator is recommended with three caveats of use. First, thirty-day mortality may be somewhat different than in-hospital mortality, leading to information bias. Second, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator. Third, there is limited evidence for the construct validity of this indicator.

INDICATOR 37: PNEUMONIA MORTALITY RATE

IndicatorPneumonia mortality rate
Relationship to QualityInappropriate treatment for pneumonia may increase mortality
BenchmarkState, regional or peer group average

Method:

Quality MeasureMortality in discharges with principal diagnosis code of pneumonia
Outcome of InterestNumber of deaths with principal diagnosis code per 100 discharges
Population at RiskAll discharges with principal diagnosis code for pneumonia (see Appendix 6).

Age 18 years and older.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates
Evidence from the literature
Face validity

Pneumonia constitutes the sixth leading cause of death in the United States, 743 causing approximately one million adult hospitalizations per year and approximately 75,000 in-hospital deaths, according to the NIS database. Patient characteristics are relatively important predictors of in-hospital mortality, though the performance of specific processes of care may also lead to better patient outcomes.

Precision

Although pneumonia admissions are common, the high degree of patient heterogeneity suggests that pneumonia mortality indicators will be imprecise. One study 39 examined the impact of choice of statistical methods for handling random variation in hospital mortality rates for pneumonia in patients under 65 years of age. Using the prognostic model (MedisGroups™) employed by a managed care company in reporting mortality for 22 Pennsylvania hospitals, the authors reanalyzed the data adjusting for multiple comparisons and other analytic errors produced by "simplistic" statistical methods. Their analysis indicated that the "simplistic" analysis resulted in a 60% chance that at least one hospital would be incorrectly labeled a high outlier for mortality.

Minimum bias

Hospitals and area physicians vary in their threshold to admit patients with pneumonia, with some admitting more "low risk" patients than others. 93 Comparison of hospital death rates with population death rates suggests that selection bias due to such differences in admission practice influences observed hospital mortality rates for pneumonia. 76 Hospitals may also transfer pneumonia patients with severe chronic comorbid illnesses to nonacute facilities or home for their terminal care. Thus, population death rates from pneumonia (in particular, non-inpatient deaths) may be an important supplement to indicators based on hospital mortality.

Risk adjustment

Generic risk adjustment models do not perform well as disease specific models for pneumonia mortality. 744 The variables contained in the pneumonia PORT scoring system in predicting mortality 121 suggest that administrative data generally do not contain enough information for highly accurate risk adjustment. Important predictors of outcome not reliably captured in administrative databases include the microbial etiology, 745 certain radiographic patterns745, 746 and pre-hospital functional status. 747 Comparisons of models that include additional clinical risk adjusters with administrative risk adjustment bear out these concerns.13, 18

Construct validity

While the impact of patient factors on hospital mortality rates for pneumonia must be emphasized, it is also true that processes of care contribute to pneumonia outcomes, including mortality. However, it is unknown how implementing these processes of care, which have been shown to affect patient level outcomes, would actually affect provider-level mortality rates. One mechanism is through the choice of antibiotics. In their PPS Quality of Care study in 1990, RAND reported better process of care for pneumonia on all five subscales and overall. On four of five process subscales and on the overall scale, patients demonstrated significantly lower risk-adjusted pneumonia mortality. For the technical therapeutic process subscale, mortality varied from 15% (good process scale score, i.e. in the highest 25%) to 21% (low process scale score, in the lowest 25%) mortality. 676 A recent study reported an association between choice of antibiotics and 3-day mortality for patients hospitalized with pneumonia. 748 Compared to recommended regimens involving second or third-generation cephalosporins, use of beta-lactam/beta-lactamase inhibitors plus a macrolide, or an aminoglycoside plus another agent were associated with an increased 30-day mortality. Based on their spectrum of antimicrobial activity, the choice of regimens associated with increased mortality might reflect concern on the part of treating physicians for greater severity of illness. The authors used a previously validated, clinically based prognostic model 121 to adjust for differences in patient risk of death. Risk-adjustment using this model did not suggest greater severity of illness for patients on the non-recommended regimens. Other results in the study, however, suggest that elements of patient risk remain unaccounted for by this model 749 (e.g., patients on the non-recommended regimens had a higher rate of ICU treatment within 24 hours of hospital arrival, which seems likely to reflect disease severity and not quality of care). Confounding by indication likely explains the results of another recent study of macrolide use in the treatment of community-acquired pneumonia (CAP). 750

More basic than the choice of a particular antibiotic regimen is the timely administration of any antibiotic to the patient presenting to the hospital with community-acquired pneumonia. A retrospective study of Medicare patients with pneumonia showed that patients who received antibiotics within 8 hours and blood cultures within 24 hours of presentation had significantly lower mortality than patients for whom these processes were not performed. 751 Timely delivery of antibiotics bears a plausible connection to improved outcomes. However, the infrequency with which blood cultures alter therapy for patients with pneumonia suggests that this association reflects the performance of other correlated but hard-to-measure aspects of care. Although a causal connection in the case of rapid performance of blood cultures is unlikely, these findings do suggest that some quality problems existed at the hospitals with higher mortality rates.

Probably for similar reasons, structural measures that seem plausibly associated with better processes of care are also associated with better outcomes. For instance, teaching status correlated with higher quality scores for treatment of pneumonia in one study. 218 The quality scores, however, involved compliance with process measures only, not outcome differences. A different study that did examine outcomes reported that teaching hospitals achieved no better risk-adjusted 30-day mortality, but also incurred higher costs. 752

Reflecting the limitations of the evidence on the impact of therapy on pneumonia outcomes, relatively few practice guidelines have focused on reducing mortality. Rather, guidelines for pneumonia are more likely to be concerned with the implications of antibiotic choice for future drug resistance.753, 754 These guidelines aim to improve long-term community outcomes due to lower prevalence of resistant microorganisms along with lower costs of treatment represent the goals of pneumonia.748, 751

Fosters true quality improvement

One study successfully improved timely delivery of antibiotics to patients with pneumonia, 755 but the intervention was not prompted by mortality data, nor did mortality due to pneumonia change as a result of the intervention.

All in-hospital mortality measures may create perverse incentives to reduce hospital mortality by discharging patients earlier, and thereby shifting deaths to skilled nursing facilities or outpatient settings. This phenomenon may also lead to biased comparisons among hospitals with different mean lengths of stay. Although we found no data on the sensitivity of standardized pneumonia mortality measures (at the hospital level) to the inclusion or exclusion of post-discharge deaths, inpatient mortality among Medicare patients in California was only 77% of 30-day mortality in 1985. 106 This gap has presumably expanded over the last 15 years. This finding suggests that changes in length of stay may modestly alter the ranking of hospital performance using this measure.

Prior use

Pneumonia mortality is currently widely discussed in the literature. In addition it is a commonly used mortality indicator. Users include: the University Hospital Consortium, 370 HealthGrades.com, 377 Greater New York Hospital Association, 372 Maryland Hospital Association (as part of the Maryland QI Project), 369 the Pennsylvania Health Care Cost Containment Council and the California Hospital Outcomes Project. 681 In addition, the Michigan Hospital Association includes pneumonia in an aggregated mortality measure. 373

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation13.8%, 10.2% 
   Systematic provider -level standard deviation**3.7%High
   Provider variation as a percentage of total variation**1.2%High
   Signal ratio**62.9%Moderate
   R-Square**78.1%High
   **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesModerate
   Absolute impact:
     Average absolute change (in %)21.9%Fair
   Relative impact:
     Rank correlation0.596Fair
     Percent remaining in high decile/low decile30.6% / 60.3%Fair
     Percent changing more than 2 deciles40.8%Fair
Precision

This indicator is precise, with a raw provider level mean of 13.8% and a standard deviation of 10.2%. The systematic provider level standard deviation is high, at 3.7%. The provider level variation also accounts for a high percentage of total variation, at 1.2%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level, though some of the variance remains at the discharge level. The signal ratio is only moderate, at 62.9%. This means that it is likely that some of the observed differences in provider performance do not represent true differences in provider performance. The high R-square (78.1%) reflects the higher proportion of signal that can be extracted using multivariate techniques.

Bias

Signal variance decreases by more than 25% with risk adjustment, suggesting that some of the observed variance is due to differences in patient characteristics. The indicator performs fairly on the multiple measures of minimum bias. The rank correlation is fair at 0.596. The impact on the extremes is large. Only 30.6% of providers in the highest decile remain, and only 60.3% in the lowest decile remain, after risk adjustment. Similarly, the number of providers moving at least two deciles in relative rank is also high. The absolute magnitude of risk adjustment is also substantial.

Construct validity

Pneumonia mortality loads on factor 1. It is positively related to mortality indicators, such as stroke, GI hemorrhage, and congestive heart failure, and to a lesser extent, hip fracture.

Discussion

Pneumonia is a leading cause of death in the United States. Treatment with appropriate antibiotics may reduce mortality. Hospitals with high mortality rates appear to have underlying quality problems when examining processes of care.

Though pneumonia mortality is common, patients are highly heterogeneous, potentially reducing the precision with which the indicator can be measured. Our empirical analysis found that the precision for this indicator was high, with high systematic variation and a moderate signal ratio. Multivariate techniques improved the amount of signal that can be extracted for this indicator. Thus, MSX smoothed estimates are recommended for this indicator to aid with precision problems due to random noise.

Our empirical analysis found substantial bias for this indicator, and it is likely that without risk adjustment, providers may be mislabeled as poor quality. The literature review notes that generic risk adjustment models derived from administrative data usually do not perform well with this indicator. Nonetheless, the potential for admitting more less severely ill patients, and thereby artificially deflating the mortality rate revisits the potential need for risk adjustment. Providers with particularly high and low mortality rates should examine the case-mix of their patients for comorbidities, age and clinical characteristics to test for simple or complex case-mixes. Chart reviews may be helpful in determining whether differences truly arise from quality of care, or patient level differences in coding, comorbidities, or severity of disease. Another source of bias is the outpatient management of some patients. Some providers may admit only the most severely ill patients and handle other patients in an outpatient setting, while others do not do this. This results in a more severe casemix. Providers may wish to examine rates of outpatient care.

Hospital discharge practices differ, with some hospitals discharging patients earlier than others. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator).

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 7 out of 26, and smoothing is highly recommended. This indicator is recommended with three major caveats of use. First, some pneumonia care occurs in an outpatient setting, and selection bias may be a problem for this indicator. Second, thirty-day mortality may be somewhat different than in-hospital mortality, leading to information bias. Third, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator.

INDICATOR 38: ACUTE STROKE MORTALITY RATE

IndicatorProvider level mortality rate for stroke.
Relationship to QualityBetter processes of care may reduce short-term mortality for stroke. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with principal diagnosis code of stroke.
Outcome of InterestNumber of deaths with principal diagnosis code for stroke (see Appendix 6).
Population at RiskAll discharges with principal diagnosis code for stroke (see Appendix 6).

Age 18 years and older.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates
Evidence from the literature
Face validity

Stroke remains the third leading cause of death in the U.S. 756 Based on two population studies,757, 758 Broderick et al conservatively estimates that approximately 725,000 acute strokes occur each year in the U.S.. 758 Analysis of the 1995 HCUP database indicates that approximately 700,000 hospitalizations each year represent acute stroke. 176 However, hospital care has a relatively modest impact on patient survival, and most stroke deaths occur after the initial acute hospitalization.

Precision

Although strokes are very common, only 10-15% of stroke patients die during hospitalization.176, 757, 759 Because stroke severity has a large effect on acute mortality, hospital mortality rates may be subject to considerable random variation.

Minimum bias

Four studies have shown significant inaccuracies in ICD-9 codes for identifying stroke patients.758, 760-763 A previous study using the HCUP database 176 pooled these results to estimate the positive predictive value (PPV) of primary or secondary ICD-9 codes 430-438, which might potentially be used to identify stroke patients. The authors' estimates suggested that code 431 performed relatively well, although 14% of patients with this code will not have had an acute stroke. Moreover, this code only accounts for a minority of all strokes. Overall, codes 431, 434, and 436 probably provide the best combination of sensitivity and specificity. However, approximately 20% of patients with these codes actually have non-stroke diagnoses, and some strokes will be missed. Many of these patients will have undergone procedures related to stroke, such as cerebral angiography of carotid endarterectomy. 762 These groups of patients with different mortality risks than those with acute stroke. Although such "false positive" patients might be excluded based on having one of these procedures on the admission date (highly unlikely, especially for carotid endarterectomy), we found no studies that investigated further steps to improve the predictive power of the discharge information. We also found no studies documenting cross-hospital variations in these coding practices, but significant variation seems possible.

The relative lack of effective medical therapy for acute stroke in the past led to considerable variation in hospital admitting practices. Geographic variations in admitting practice exert a significant effect on hospital mortality rates, far more than do variations in case-fatality rates. 764 Although it is possible that recent evidence on stroke treatments may have altered these historic patterns of variation, it seems likely that they remain large. Thus, comparisons within geographic areas are likely to be more valid than cross-area comparisons. In addition, because of the increased interest in the care of acute stroke patients, more patients with transient ischemic attacks (TIA) are likely to be admitted at some hospitals, whereas previously many such patients were considered safely managed as outpatients. 765 Because patients experiencing TIAs are much less likely to die in hospital than patients sufferingacute stroke, hospitals with more liberal admitting policies may appear to have lower mortality rates, if these "rule-out" stroke patients are incorrectly coded as true strokes. Little evidence exists on the extent to which such coding anomalies exist.

Although strokes are associated with significant one-year mortality, early deaths tend to occur only in a subset of patients with the severest strokes (e.g., those in a coma at presentation). The majority of stroke deaths reflect medical complications in the weeks to months following admission.757, 759, 766 Thus, hospitals with longer LOS, especially those that keep patients in hospital for rehabilitation therapy rather than discharging them to subacute care, may appear to have higher mortality rates. Investigating hospital LOS and transfer rates to rehabilitation hospitals in conjunction with stroke mortality may provide some evidence on this question.

Risk adjustment

Studies of the impact of risk adjustment based on discharge data suggest that a large number of hospitals identified as outliers using one system will have acceptable mortality rates using others.10, 11 However, as noted above, mortality rates are a relatively noisy measure of hospital quality. Thus, it is possible that the different comorbidity systems were simply picking up the fact that the "true" quality of a hospital is not very well identified using simple adjusted stroke mortality rates. A recent prognostic model combined clinical and imaging variables to predict morbidity and mortality after acute stroke with excellent Receiver Operating Curve (ROC) characteristics. 767 Importantly, this model did not include "Do Not Resuscitate" orders as a predictor, a factor that detracted from the plausibility of previous predictions mortality models for stroke. 768 Most of the seven predictor variables in this recent model 767 are not reliably captured in current hospital discharge databases.

Some key mortality predictors might be captured. Coma at presentation confers a markedly increased risk death in acute stroke. 10 Use of mechanical ventilation on the first day of hospitalization quite reliably identifies stroke patients with coma at presentation. 769 Thus, if reliably measured, risk adjustment based on the use of mechanical ventilation on the first day of admission is a potentially important and feasible risk adjuster. A history of previous stroke substantially increases the mortality of patients admitted with stroke, 766 but non-longitudinal discharge data will not be able to distinguish first strokes from recurrent strokes.

Despite the smaller effect of aspirin on both primary and secondary prevention of stroke, patients with acute stroke who already are taking aspirin (e.g., for any other indication) tend to have better outcomes.770, 771 Given these recent results, prior aspirin use might be an important omitted variable. However, no data exist on the extent to which prior aspirin use differs systematically across stroke patients admitted to different hospitals.

Construct validity

We located no articles specifically addressing construct validity at a hospital level. On a patient level processes of care have been shown to decrease mortality. However, it is unknown how implementing these processes of care would actually affect provider-level mortality rates. The publicity surrounding one randomized trial showing a benefit for thrombolytic therapy in acute stroke 772 generated considerable attention as ushering in a new era of stroke therapy. Importantly, though, the patients treated with thrombolytic therapy in all of the randomized trials represent only a small percentage of all stroke patients. 773 To derive benefit, patients must receive thrombolytic therapy within 3 hours of symptom onset, 774 but the majority of patients do not even arrive at the hospital within three hours.775-777 Even among experienced neurologists and radiologists, considerable inter-rater agreement is only moderate for characterizing computed tomography findings that qualify patients for thrombolytic therapy. 778

A recent study involving 57 U.S. medical centers managed to enroll only 389 patients over an almost 2 year study period, 779 and 13% of these patients in fact received thrombolysis after the 3 hour time window stipulated in the study protocol. Another recent study reported protocol violations (compared to national guidelines) in approximately 50% of patients, and a higher mortality overall for patients who received thrombolytic therapy. 780 Other studies have replicated the reported benefits of thrombolysis for acute stroke, but the existence of these conflicting results and the small percentage of stroke patients who receive thrombolysis suggest that more or less effective use of this treatment is likely to have only a modest impact on hospital mortality.

Other treatments have also shown only limited benefits for acute mortality. In contrast to the situation with acute myocardial infarction, the clinical benefit of aspirin appears modest.781-784 Two large studies have demonstrated a significant benefit for aspirin in acute stroke,785, 786 but again in contrast to acute myocardial infarction, little evidence exists on whether aspirin is underused in stroke management. Moreover, the magnitude of the impact of aspirin on acute stroke mortality is much smaller.785-787

In the RAND PPS Quality of Care study in 1990, patients with higher stroke process scale scores demonstrated significantly lower risk-adjusted 30-day stroke mortality on three out of four measured process subscales and on an overall process scale. 676

The use of stroke units to care for patients with acute stroke (as opposed to caring for them on general medical wards) likely lowers mortality and improves functional outcomes.788, 789 Overviews of the literature on stroke units reveal that the impact of stroke units has mostly been demonstrated in Europe.788, 790 Evidence on their impact in the United States is more limited. Taken together, these factors suggest that inpatient care is unlikely to have a large impact on stroke mortality rates.

Stroke admission rates and mortality might be a more valid indicator of the quality of preventive ambulatory care. The underuse of anticoagulation to prevent embolic strokes in patients with atrial fibrillation does (in contrast to aspirin) represent an important quality problem. 791 Underuse of coumadin represents a failure of previous care, not the hospitalization for the acute stroke. Similarly, inadequate detection and control of hypertension, hypercholesterolemia, and other risk factors may lead to more frequent and more severe stroke admissions. One difficulty with using stroke outcomes for evaluating ambulatory care is that the coding of stroke subtypes (hypertensive, embolic, etc.) in discharge data is not very reliable. 762

Fosters true quality improvement

The relatively low early compared to late mortality raises the possibility of a perverse incentive for institutions to lower their stroke mortality by discharging patients to die in nursing homes. Thus, investigating rates of discharge to nonacute facilities or transfers to other hospitals might be important to consider in conjunction with in-hospital stroke mortality. "Overcoding" transient ischemic attacks as strokes may also result in decreasing stroke mortality rates; although in principle this should not occur, we found little evidence on whether such coding errors were frequent.

All in-hospital mortality measures may create perverse incentives to reduce hospital mortality by discharging patients earlier, and thereby shifting deaths to skilled nursing facilities or outpatient settings. This phenomenon may also lead to biased comparisons among hospitals with different mean lengths of stay. Although we found no data on the sensitivity of standardized stroke mortality measures (at the hospital level) to the inclusion or exclusion of post-discharge deaths, inpatient mortality among Medicare patients in California was only 76% of 30-day mortality in 1985. 106 This gap has presumably expanded over the last 15 years. This finding suggests that changes in length of stay may modestly alter the ranking of hospital performance using this measure.

Prior use

Stroke mortality has been used in the literature to a limited extent, and evaluations have focused on the limitations of the measure described above.10, 63

Several organizations have used stroke mortality indicators, including the University Hospital Consortium, 370 HealthGrades.com, 377 Maryland Hospital Association Quality Indicator Project, 369 and the Greater New York Hospital Association. 372 The Cleveland Health Quality Choice 374 project also includes a stroke mortality measure, however this indicator is risk adjusted using detailed clinical data. The Michigan Hospital Association 373 includes stroke mortality in its aggregate measure of in-hospital mortality.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation21.3%, 13.7% 
   Systematic provider -level standard deviation**5.3%High
   Provider variation as a percentage of total variation**1.7%High
   Signal ratio**51.9%Moderate
   R-Square**70.7%High
   **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)13.1%Good
   Relative impact:
     Rank correlation0.803Good
     Percent remaining in high decile/low decile63.8% / 62.9%Good/ Fair
     Percent changing more than 2 deciles24.4%Fair
Precision

This indicator is precise, with a raw provider level mean of 21.3% and a standard deviation of 13.7%. The systematic provider level standard deviation is high, at 5.3%. The provider level variation also accounts for a high percentage of total variation, at 1.7%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level, though some of the variance remains at the discharge level. The signal ratio is only moderate, at 51.9%. This means that it is likely that some of the observed differences in provider performance do not represent true differences in provider performance. The high R-square (70.7%) reflects the higher proportion of signal that can be extracted using multivariate techniques.

Bias

Signal variance decreases substantially with risk adjustment, indicating that some of the true variation among providers reflects differences in patient characteristics. The indicator performs fairly to well on the measures of minimum bias. The rank correlation is good at 0.803, and risk adjustment impacts the lowest decile disproportionately to the highest, relative to other indicators. Further, 24.4% of providers change more than 2 deciles with risk adjustment. The absolute impact of risk adjustment is moderate.

Construct validity

Stroke mortality loads highly on factor 1. It is positively related to mortality indicators, such as pneumonia, GI hemorrhage, and congestive heart failure, and to a lesser extent, hip fracture.

Discussion

Acute stroke mortality has been the subject of study as a potential indicator. Our literature review noted some potential problems with the indicator, stemming from this literature. Quality treatment must be timely and efficient to prevent potentially fatal brain tissue death. Patients may not present until after the fragile window of time has past. Further, many deaths occur out of the hospital, suggesting that linkage to death records for patients post-discharge may be a good addition to this indicator.

This indicator is measured with adequate precision, having high provider systematic variation and a moderate signal ratio. Some of the observed variance may not in fact be true differences in performance. Mortality for stroke may be a rare event at small hospitals, reducing the precision for these providers. Multivariate techniques improve the amount of signal that can be extracted for this indicator. As such, MSX smoothing of data is recommended for this indicator, to help reduce precision problems due to random noise.

Our empirical analyses found some bias, when adjusting for age, sex and APR-DRGs. Risk adjustment is recommended for this indicator. Other risk adjustment systems have been reported to change relative provider performance for this indicator. Clinical factors of severity upon presentation, including mechanical ventilation on the first day, may vary systematically by hospital and do influence mortality. Providers with high rates may wish to examine the case-mix for these potentially complicating factors. Further, hospitals with rehabilitation programs may have higher rates than hospitals that discharge to sub-acute care. Providers may want to use this indicator in conjunction with length of stay for their hospitals and for surrounding areas. Coding appears suboptimal for acute stroke and may lead to bias. As such, some caution in interpreting this indicator is warranted.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 10 out of 26, and smoothing is highly recommended. This indicator is recommended with three major caveats of use. First, some stroke care occurs in an outpatient setting, and selection bias may be a problem for this indicator. Second, thirty-day mortality may be somewhat different than in-hospital mortality, leading to information bias. Third, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator.

3.E.6. Post-Procedural Mortality Measures

INDICATOR 39: ABDOMINAL AORTIC ANEURYSM (AAA) REPAIR MORTALITY RATE

IndicatorProvider level mortality rate for AAA repair.
Relationship to QualityBetter processes of care may reduce mortality for AAA repair. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with procedure code of AAA repair.
Outcome of InterestNumber of deaths with procedure code for AAA repair (see Appendix 6).
Population at RiskAll discharges with procedure code of AAA repair and diagnosis of AAA (see Appendix 6) in any field.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Abdominal aortic aneurysm repair requires technical proficiency with the use of complex equipment. Technical errors may lead to clinically significant complications, such as arrhythmias, acute myocardial infarction, colonic ischemia, and death. Mortality is relatively high, especially if the aneurysm has already ruptured. Recent studies using North American population-based data sets have reported 3.5-6.2% in-hospital mortality after elective repair of intact aneurysms188, 196, 294-296, 300, 301 and 40-55% in-hospital mortality after emergent repair of ruptured aneurysms.62, 196, 299-301 These data suggest that improved quality of care could have a substantial impact on public health, despite the fact that the condition is relatively uncommon.

Precision

Abdominal aortic aneurysmectomy is not as common as the other cardiovascular procedures described in this report; only about 48,600 were performed in the USA in 1997 (1.8 per 10,000 persons). 293 Based on state all-payer databases, the mean annual frequency of abdominal aortic aneurysmectomies was 16.4-18.3 per hospital in Florida (unruptured only) in 1992-1996, 294 8.4 per hospital in New York (unruptured only) in 1990-1995, 295 and 13.8 per hospital in Maryland in 1990-1995. 296 The relatively small number of abdominal aortic aneurysm resections performed by each hospital suggests that mortality rates are likely to be unreliable at the hospital level.

Minimum bias

Any measure based on an elective procedure, rather than a patient diagnosis, holds the potential for selection bias caused by the decision to elect surgery. Theoretically, one could account for the patient characteristics that are selected upon, but it is unlikely that administrative data are rich enough to do this comprehensively. Most previous studies of mortality following abdominal aortic aneurysmectomy have used administrative data, so these studies have been unable to identify residual bias. The known predictors of in-hospital mortality include whether the aneurysm is intact or ruptured (see above), age, female gender, admission through an emergency room,188, 294-296 various comorbidities such as renal failure and dysrhythmias,195, 300 and Charlson's comorbidity index. 301 The largest study involving clinical data identified several additional predictors, the first three of which were more powerful than age: American Society of Anesthesiologists' risk class, leukocyte count, blood urea nitrogen, weight loss, and serum albumin. 298 A history of myocardial infarction and the number of myocardial segments with reversible thallium defects were independent predictors of major cardiac events after abdominal aortic aneurysmectomy in one study, 792 but the latter variable was non-significant in another study. 793 In the absence of studies explicitly comparing models with and without these additional clinical data elements, it is difficult to assess whether administrative data contain sufficient information to remove bias.

Construct validity

Several lines of evidence support the construct validity of this indicator. First, numerous studies (summarized above) have reported an association between hospital volume and mortality following abdominal aortic aneurysmectomy. The consistent association between volume and risk-adjusted mortality supports the validity of both measures of performance, and is consistent with the hypothesis that more experience leads to improved technical skills and better outcomes.

Second, previous studies have identified several other hospital or surgeon characteristics that are associated with lower mortality, after adjustment for patient characteristics ascertainable from administrative data. These characteristics include surgeon volume,294, 296 especially for ruptured aneurysms,62, 195, 299 board certification as a vascular surgeon,294, 299 and having daily rounds by an intensive care physician. 197 Although two studies failed to show significant effects of surgeon volume,195, 197 the correlation between hospital/physician characteristics and in-hospital mortality in most studies supports the validity of in-hospital mortality as a measure of quality.

Third, excessive blood loss was identified in one study as the most important predictor of mortality after elective abdominal aortic aneurysmectomy. 794 Excessive blood loss is a potentially preventable complication of surgery. Similarly, perioperative beta adrenergic blockade may help prevent cardiac deaths and nonfatal myocardial infarctions among high-risk patients undergoing major peripheral vascular surgery, including abdominal aortic aneurysmectomy. 795

Fosters true quality improvement

One possible adverse effect of in-hospital mortality measures is to encourage earlier postoperative discharge. We are aware of no data on the likelihood or consequences of premature discharge after abdominal aortic aneurysmectomy. Another potential response would be to avoid operating on high-risk patients, although it is unclear to what extent providers could actually recognize and avoid high-risk patients. Indeed, many high-risk patients would actually benefit from being transferred to a more experienced center.

All in-hospital mortality measures may create perverse incentives to reduce hospital mortality by discharging patients earlier, and thereby shifting deaths to skilled nursing facilities or outpatient settings. This phenomenon may also lead to biased comparisons among hospitals with different mean lengths of stay. We found no published evidence about whether the difference between inpatient and 30-day mortality for AAA mortality is substantial enough to cause concern.

Prior use

Abdominal aortic aneurysmectomy mortality has not been widely used as an indicator of quality. It is used by HealthGrades.com. 377 Pennsylvania Health Care Cost Containment Council includes AAA repair in their "Other major vessel operations except heart (DRG 110)" indicator, though the indicator is defined using clinical risk adjustment. 681

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation21.5%, 26.8% 
   Systematic provider -level standard deviation*6.7%High
   Provider variation as a percentage of total variation*3.5%High
   Signal ratio*30.7%Low
   R-Square*57.1%Moderate
   * age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)23.4%Fair
   Relative impact:
     Rank correlation0.890Good
     Percent remaining in high decile/low decile70.8% / 64.6%Good/ Fair
     Percent changing more than 2 deciles14.1%Good
Precision

This indicator is precise, with a raw provider level mean of 21.5% and a substantial standard deviation of 26.8%. The systematic provider level standard deviation is high, at 6.7%. The provider level variation also accounts for a high percentage of total variation, at 3.5%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level. Finally, the signal ratio is low, at 30.7%. This means that it is likely that the some of the observed differences in provider performance do not represent true differences in provider performance. Additional signal can be extracted using multivariate techniques, as reflected by the moderate R-square. However, this signal remains lower than for other indicators.

Bias

Signal variance decreases by over 25% with risk adjustment, indicating that some of the true variation among providers reflects differences in patient characteristics. Risk adjustment for APR-DRGs was not available for this indicator, due to its distribution. Therefore, only age and sex risk adjustment was performed. The indicator performs fairly to well on the multiple measures of minimum bias. The rank correlation is high at 0.890, and both the high and low decile are impacted by risk adjustment. The percentage of providers moving at least two deciles in relative rank is moderate (14.1%). The absolute magnitude of the impact of risk adjustment is high.

Construct validity

AAA repair does not load substantially on any of the three retained factors. However, it does appear to be positively related to other post-procedural mortality measures, such as craniotomy (r=.28, p<.0001) and CABG (r=.17, p<.01).

Discussion

AAA repair is technically difficult procedure, with a relatively high mortality rate. The main evidence for the validity of this indicator arises from the volume-outcome literature. Higher volume hospitals have been noted to have lower mortality rates, suggesting some difference in the processes of care between lower and higher volume hospitals, resulting in better outcomes. What those processes are, if they truly exist, is not known.

AAA repair is an infrequent procedure, calling into question the precision of this indicator. Our empirical analyses confirmed this suspicion, though smoothing the indicator appeared to help tremendously, and is recommended for this indicator. After smoothing, this indicator is precise, with substantial systematic variation. However, the R-square is only moderate, suggesting that some of the observed variation does not reflect true differences in performance.

Little literature has been published on the potential bias of this indicator. It is known that comorbidities and clinical factors such as anesthesia risk and laboratory results do influence mortality rates. Due to the distribution of this indicator, we were not able to adjust using APR-DRGs. Our empirical analyses of demographic risk adjustment noted some potential bias for this indicator. Use of smoothed data (via MSX or other procedures) may help to avoid the erroneous labeling of a hospital as an outlier. Additional medical chart review and/or analyses of laboratory data may be helpful in determining whether more detail risk adjustment is necessary.

Hospital discharge practices differ, with some hospitals discharging patients earlier than others. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator).

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 8 out of 26, and smoothing is highly recommended. This indicator is recommended with two major caveats of use. First, risk adjustment for clinical factors, is recommended due to the confounding bias for this indicator. Second, little evidence exists supporting the construct validity of this indicator.

INDICATOR 40: CORONARY ARTERY BYPASS GRAFT (CABG) MORTALITY RATE

IndicatorProvider level mortality rate for CABG.
Relationship to QualityBetter processes of care may reduce mortality for CABG. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with procedure code of CABG.
Outcome of InterestNumber of deaths with diagnosis code for CABG (see Appendix 6).
Population at RiskAll discharges with ICD-9-CM code 36.1 in any procedure field. (see Appendix 6)

Age 40 years and older.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Post-CABG mortality rates have recently become the focus of several state public reporting initiatives.40, 796, 797 Although the effect of the reporting on reducing post-CABG mortality and influencing cardiologists referral patterns may not be substantial796, 797 or may even have had unintended consequences, 162 studies suggest that the reports are widely read and serve as the basis for discussions between physicians and patients about the risks associated with cardiac surgery.

Precision

One study that specifically considered the precision of the post-CABG mortality reports found that there is a significant amount of random variation, particularly for smaller hospitals due to reduced sample sizes. 40 By applying hierarchical statistical models intended to remove the random noise, the authors were able to detect some outliers that appeared to differ in performance not just because of chance. Without applying such sophisticated statistical methods, it is likely that hospitals will be identified as outliers as a result of random variations (patient and other factors beyond the hospital's control).

Minimum bias

Any measure based on an elective procedure, rather than a patient diagnosis, holds the potential for selection bias caused by the decision to elect surgery.40, 162 Theoretically, one could account for the patient characteristics that are selected upon, but it is unlikely that administrative data are rich enough to do this comprehensively. On the basis of numerous studies using large databases from New York's Cardiac Surgery Reporting System, 798 the Northern New England Cardiovascular Disease Study Group, 799 the Cleveland Clinic, 800 and the Society of Thoracic Surgeons (among others), 801 there is general consensus that cardiac function, coronary disease severity, and the urgency of surgery are powerful predictors of mortality. 101 Hannan and colleagues demonstratedonly moderate correlations (0.69-0.75) in hospital risk-adjusted mortality rates between a model based on detailed clinical data and models based on either Medicare claims 799 or hospital discharge abstracts 799 ; classification of hospital outliers differed substantially. These differences were largely attributable to two or three risk factors not available from administrative data. In another study, comparisons of hospital performance were relatively insensitive to the severity adjustment method, with correlations greater than 0.9 204 and code-based measures (e.g., APR-DRGs) had better statistical performance than two measures based on physiologic data. 17 These findings may be misleading, because the code-based measures included complications of care as well as comorbidities, and because the physiologic measures were not designed in accord with professional consensus. 799

Construct validity

Several lines of evidence provide limited support for the construct validity of this indicator. First, numerous studies (summarized above) have reported an association between hospital volume and mortality following coronary bypass surgery. Sowden et al. 319 systematically reviewed 15 studies of the volume-outcome relationship for CABG; six used non-overlapping data and reported effect estimates for fixed volume categories. Among these six studies, the apparent benefit of high CABG volume (>200 cases per year) diminished as casemix adjustment improved. Because casemix adjustment was generally more complete in more recent studies, the authors could not exclude the possibility that the benefit of high volume actually decreased between 1972 and 1991. In addition, analyses using instrumental variables suggest that much, if not all, of the volume effect may be due to "selective referral" of patients to high-quality centers.230, 239 These findings raise doubts about whether we can use the volume-outcome association as evidence of the construct validity of risk-adjusted mortality as a quality indicator. Studies of surgeon volume are similarly difficult to interpret, because of diminishing effects over time or counter-intuitive findings.227, 233, 321

The second source of evidence is that aortic crossclamp or perfusion time has been repeatedly associated with postoperative mortality, adjusting for a variety of patient characteristics.802-804 In addition, longer cross-clamp times are associated with a higher incidence of postoperative atrial fibrillation. 805 Experienced surgeons and surgical teams should be able to reduce aortic crossclamp or perfusion time, thereby improving postoperative mortality. Mathew et al. also found that specific surgical techniques, including pulmonary vein venting and bicaval venous cannulation, may increase the risk of postoperative atrial fibrillation, which may in turn increase the risk of death. Perioperative use of beta blockers may have a cardioprotective effect, whereas use of nitrates may have a deleterious effect on in-hospital mortality. 806 Of course, patient-level reduction in mortality does not necessarily correspond with provider-level mortality. It is unknown how implementing these processes of care would actually affect provider-level mortality rates. Finally, several authors have reported on the experience of individual hospitals that responded to unfavorable risk-adjusted mortality data by identifying specific process failures. When these process failures were corrected, risk-adjusted mortality160, 807 or length of stay 808 improved. None of these studies systematically compared processes of care between low-mortality and high-mortality hospitals, and all used risk-adjustment models that included physiologic predictors.

Fosters true quality improvement

One response physicians might make to public reporting of procedure-based mortality rates is to avoid operating on high-risk patients. Given the fact that high-risk patients may benefit the most from coronary bypass surgery, adverse selection is a serious concern. One study from the Cleveland Clinic reported a modest increase in the average annual volume of referrals from New York, from 61.4 before risk-adjusted mortality reports (1980-88) to 96.2 thereafter (1989-93). These patients were unusually high risk, with an expected mortality rate 28% to 37% higher than that of Ohio residents. 160 In Pennsylvania, a drop from the 75th to the 25th percentile in the standardized, population-based rate of CABG surgery was associated with a 10% increase in the ratio of observed to expected mortality. 799 Nearly half (46%) of cardiothoracic surgeons surveyed in New York reported that one or more of their patients was "refused surgery last year, with the NYS CSRS (Cardiac Surgery Reporting System) being an integral part of the decision-making process." 809 Similarly, 63% of cardiothoracic surgeons surveyed in Pennsylvania reported that they were "less willing" to operate on the most severely ill patients since mortality data were first released; 59% of cardiologists reported that it had become "more difficult" to find surgeons willing to operate on such patients. 160 Yet according to Medicare data, the percentage of New York residents receiving bypass surgery out-of-state actually decreased in the early 1990's, with a concurrent increase (paralleling national trends) in the use of CABG after myocardial infarction. 810 Hence, there is no convincing evidence that providers are actually avoiding high-risk patients in states with public reporting. Although in-hospital mortality after CABG surgery decreased significantly in states that adopted outcome reporting (e.g., New York, 160 Northern New England 160 ), in-hospital mortality after CABG surgery has also decreased in neighboring states despite the absence of statewide outcome reporting (e.g., Massachusetts). 160 A more recent analysis based on Medicare data found that risk-adjusted 30-day mortality after CABG surgery declined more rapidly in New York (10.3% per year) than in the rest of the nation (5.8% per year) between 1987 and 1992; New York and Northern New England had the lowest risk-adjusted CABG mortality rates in the US in 1992. 160 A better understanding of how and whether public reporting leads to improved quality, or what other factors may be at work, is required to understand the effect of ongoing statewide outcome studies. 204

All in-hospital mortality measures may create perverse incentives to reduce hospital mortality by discharging patients earlier, and thereby shifting deaths to skilled nursing facilities or outpatient settings. This phenomenon may also lead to biased comparisons among hospitals with different mean lengths of stay. Although we found no data on the sensitivity of standardized CABG mortality measures (at the hospital level) to the inclusion or exclusion of post-discharge deaths, inpatient mortality among Medicare patients in California was over 90% of 30-day mortality in 1987-88. 811 Although this gap may have expanded over the last 15 years, the data suggest that changes in length of stay may have relatively little effect on the ranking of hospital performance using this measure. 812

Prior use

Post-CABG mortality is publicly reported by several state health agencies, including the California CABG Mortality Reporting Project, 813 Pennsylvania Health Care Cost Containment Council, 681 New Jersey Department of Health and Senior Services, 799 and the New York Dept. of Health. 814 However, all of these programs are based on detailed clinical data systems. The Northern New England Cardiovascular Disease Study Group uses a similar data system, but reports risk-adjusted outcomes only to participating providers. 799 Other recent users of CABG mortality rates as a quality indicator, with more limited risk-adjustment, include the University Hospital Consortium, 370 JCAHO's IMSystem, HealthGrades.com 377 , Maryland Hospital Association (as part of the Maryland QI Project) 369 , and Greater New York Hospital Association. 372

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation5.1%, 6.2% 
   Systematic provider -level standard deviation**1.4%Moderate
   Provider variation as a percentage of total variation**0.5%Moderate
   Signal ratio**54.5%Moderate
   R-Square**69.8%Moderate
   **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)15.2%Good
   Relative impact:
     Rank correlation0.743Fair
     Percent remaining in high decile/low decile41.4% / 65.5%Fair
     Percent changing more than 2 deciles30.1%Fair
Precision

This indicator is precise, with a raw provider level mean of 5.1% and a standard deviation of 6.2%. The systematic provider level standard deviation is moderate, at 1.4%. The provider level variation also accounts for a moderate percentage of total variation, at 0.5%. This means that relative to other indicators, a lower percentage of the variation occurs at the provider level, rather than the discharge level. The signal ratio is only moderate, at 54.5%. This means that it is likely that the some of the observed differences in provider performance do not represent true differences in provider performance. The moderate R-square (69.8%) reflects the higher proportion of signal that can be extracted using multivariate techniques.

Bias

Signal variance decreases by more than 25% with risk adjustment, suggesting that some of the observed variance is due to differences in patient characteristics. The indicator performs fairly on the multiple measures of minimum bias. The rank correlation is fair at 0.743. The impact on the extremes is large. Only 41.4% of providers in the highest decile remain, and only 65.5% in the lowest decile remain, after risk adjustment. Similarly, the number of providers moving at least two deciles in relative rank is also high. However, the absolute magnitude of risk adjustment is moderate.

Construct validity

CABG mortality loads on factor 2. It is positively related to bilatateral catheterization, and negatively related to laparoscopic cholecystectomy.

Discussion

CABG mortality is one of the most widely used and publicized post-procedural mortality indicators. However, we found limited evidence regarding the face or construct validity of this indicator.

The precision estimates for this indicator are somewhat lower than some of the other post-procedural indicators. However, CABG is a very common procedure increasing the importance of this indicator. The variance and precision are adequate for its use as a quality indicator. Multivariate techniques improves the ability to extract signal for this indicator, as such use of smoothed estimates will help avoid erroneous labeling of outlier hospitals, and is recommended.

This indicator is subject to substantial bias, as shown in our empirical analyses. Demographics, comorbidities, and clinical characteristics of severity of disease are important predictors of outcome that may vary systematically by provider. One study did note that APR-DRGs had the best statistical performance of any risk-adjustment system in predicting mortality, though this could be an artifact of including all codes that could be potential complications. Providers should risk adjust this indicator and may wish investigate potential case-mix differences using medical chart review when interpreting the results of this indicator. Chart review may also help distinguish comorbidities from complications, a potential pitfall of this indicator. Further, providers using APR-DRGs for risk adjustment may wish to screen for high rates of diagnoses that are potential complications of care, rather than comorbidities.

Hospital discharge practices differ, with some hospitals discharging patients earlier than others. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator). While inpatient and 30-day mortality appear to be similar, providers may wish to track 30-day mortality if possible.

As with other mortality measures, there is concern that measuring mortality rates would result in access problems for higher risk patients. If possible, it may be useful to monitor the outcomes for CABG patients that do not undergo surgery. Chart review may also help to identify "best practices" to ensure real quality improvement.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 5 out of 26, and smoothing is highly recommended. This indicator is recommended with four major caveats of use. First, as CABG is an elective procedure, some selection of the patient population may lead to bias. Further, providers may inflate the denominator by performing more CABG procedures on less clinically complex patients with questionable indications. Third, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator. Finally, the evidence for the construct validity of this indicator is limited.

INDICATOR 41: CRANIOTOMY MORTALITY RATE

IndicatorProvider level mortality rate for craniotomy.
Relationship to QualityBetter processes of care may reduce mortality for craniotomy. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 discharges with procedure code of craniotomy.
Outcome of InterestNumber of deaths with procedure code for craniotomy except for trauma (see Appendix 6).
Population at RiskAll discharges with procedure code for craniotomy, except for trauma (see Appendix 6).

Age 18 years or older.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates
Evidence from the literature
Face validity

Craniotomy for the treatment of subarachnoid hemorrhage (SAH) and/or cerebral aneurysm is a condition and treatment that entails significant risk, with post-operative patient mortality and complications as the result of stroke, intracranial hypertension, systemic infections, hypoxia, pulmonary embolus, and cardiac arrythmias. 815 The procedure requires significant technical skill, and the ability to identify the most appropriate cases. Together with measures of volume and utilization, post-operative mortality rates will give a comprehensive perspective on provider performance for this condition.

Precision

Most providers perform relatively high number of procedures, with hospitals averaging 70 procedures per year for treatment of subarachnoid hemorrhage (SAH), and 30 per year for cerebral aneurysm.816, 817 Post-operative mortality rates are also relatively high, averaging nearly 14% for patients 65+, 818 which will improve precision of the provider estimates.

Minimum bias

Studies have shown significantly higher post-craniotomy mortality rates by age group (from 3% for 23 to 39 year olds to 17% for 70+) for patients undergoing treatment for subarachnoid hemorrhage (SAH)816, 819. Other measures of health status and resource use, for example APACHE II score and ICU days, did not differ by age group, although these measures were generally higher for the elderly (65+) than the non-elderly (<65). For SAH, older patients generally present with more severe illness, including lower levels of consciousness, worse grade, a thicker subarachnoid clot, intraventricular hemorrhage, and hydrocephalus on admission. Older patients also present with higher comorbidity rates, including diabetes, hypertension, and pulmonary, myocardial, and cerebrovascular disease. 819 Age seems to have an independent effect on outcomes apart from these pre-existing conditions, suggesting that the aging brain does not recover as well after initial bleeding. In summary, when controlling for age, a data element available on discharge abstracts, could also control for many other additional sources of bias across providers.

Construct validity

We located no evidence specifically evaluating the construct validity of this indicator. However, some evidence may lend to the validity of this indicator. Because the procedure risk is so high, provider skill may also be also be an important determinant in outcome. 817 Considering a post-operative mortality measure conjunction with a volume and utilization measure may offer the most comprehensive perspective on provider quality. Providers that perform more that 30 procedures per year have lower mortality than providers performing fewer than 30,817, 818 although as we state in the introduction the volume-outcome relationship may be a product of patient selection. In one study, patients who were referred to a large medical center for SAH were less likely to have died early, and had fewer severe indications, including lower clinical grade, rate of coma, diastolic blood pressure, and younger patient age. 820 A utilization measure might help identify providers with greater propensity to perform the procedure.

Fosters true quality improvement

All in-hospital mortality measures may create perverse incentives to reduce hospital mortality by discharging patients earlier, and thereby shifting deaths to skilled nursing facilities or outpatient settings. This phenomenon may also lead to biased comparisons among hospitals with different mean lengths of stay. We found no published evidence about whether the difference between inpatient and 30-day mortality for craniotomy is substantial enough to cause concern.

Prior use

Post-operative mortality for craniotomy, non-trauma related is a measure used by the University Hospital Consortium (UHC). 370

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation16.2%, 18.5% 
   Systematic provider -level standard deviation**3.7%High
   Provider variation as a percentage of total variation**1.5%High
   Signal ratio**28.9%Low
   R-Square**49.0%Moderate
   **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)33.8%Fair
   Relative impact:
     Rank correlation0.786Good
     Percent remaining in high decile/low decile29.6% / 61.1%Fair
     Percent changing more than 2 deciles27.4%Fair
Precision

This indicator is precise, with a raw provider level mean of 16.2% and a substantial standard deviation of 18.5%. The systematic provider level standard deviation is high, at 3.7%. The provider level variation also accounts for a high percentage of total variation, at 1.5%. This means that relative to other indicators, a higher percentage of the variation occurs at the provider level, rather than the discharge level, though some remains at the discharge level. The signal ratio is low, at 28.9%. This means that it is very likely that the some of the observed differences in provider performance do not represent true differences in provider performance. The moderate R-square (49.0%) reflects the higher proportion of signal that can be extracted using multivariate techniques, though this remains lower than other indicators.

Bias

Signal variance decreases by more than 25% with risk adjustment. The indicator performs fairly on the multiple measures of minimum bias. The rank correlation is high at 0.786. Risk adjustment affects the extreme ends of the distribution substantially. Only 29.6% of providers in the highest decile remain after risk adjustment, and 61.1% in the lowest decile remain. Similarly, the number of providers moving at least two deciles in relative rank is also high.

Construct validity

Craniotomy does not load substantially on any of the three retained factors. However, it does appear to be positively related to other post-procedural mortality measures, such as AAA repair (r=.28, p<.0001) and CABG (r=.23, p<.0001), as well as stroke mortality (r=.49, p<.0001).

Discussion

Craniotomy is a complex procedure requiring surgical skill. Providers with high rates have better outcomes, though this may be an artifact of patient selection. We found little further evidence on the face validity or construct validity of this indicator.

This indicator is measured with good precision, with very high provider systematic variation. However, the signal ratio is low, suggesting that some of the observed variation may not reflect true differences in performance. Post-operative mortality rates are relatively high, and the surgery is relatively common for most providers. Such high provider variation suggests that improving quality of care could potentially greatly improve outcomes. Multivariate techniques improve the ability to extract signal for this indicator and are recommended. Use of smoothed estimates (via MSX or other methods) will aid in reducing precision problems due to random noise, such as the erroneous labeling of providers as "outliers."

Using our empirical analyses we identified substantial bias for this indicator, particularly for age. Our literature review also suggests that age is an important factor to consider for risk adjustment. Older patients also have more severe illness and comorbidities, though these do not explain all of the higher mortality rates observed in an elderly population. Providers should risk adjust for age and comorbidities. Further examination of other patient characteristics that increase case-mix complexity may be completed through medical chart reviews or analyses of laboratory tests.

As with other mortality measures, there is concern that measuring mortality rates would result in access problems for higher risk patients. If possible, it may be useful to monitor the outcomes for subarachnoid hemorrhage or cerebral aneurysm patients that do not undergo surgery.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 6 out of 26, and smoothing is highly recommended. This indicator is recommended with two major caveats of use. First, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator. Second, little evidence exists supporting the construct validity of this indicator.

INDICATOR 42: ESOPHAGEAL RESECTION MORTALITY RATE

IndicatorProvider level mortality rate for esophageal resection.
Relationship to QualityBetter processes of care may reduce mortality for esophageal resection. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 patients with discharge procedure code of esophageal resection.
Outcome of InterestNumber of deaths with procedure code for esophageal resection (see Appendix 6).
Population at RiskAll discharges with procedure code of esophageal resection (see Appendix 6) and diagnosis code for esophageal cancer in any field.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Esophageal resection is a complex procedure that requires technical skill. The primary evidence for this indicator arises from the volume-outcome literature. Several studies have found that hospitals that perform more procedures have better mortality rates than lower volume hospitals. The magnitude of this relationship is relatively large as compared to other procedures. A full review of this literature can be found in the discussion of pancreatic resection as a volume indicator. This relationship suggests that there may be some differences in processes of care that result in better outcomes. Those processes have not been identified and are subject to controversy, as it is unclear what the causal relationship is, if there truly is one, between hospital volume and mortality.

Precision

Esophageal resection is a relatively uncommon procedure, which may impact the precision of the indicator. Patti et al 198 noted that most hospitals perform 10 or fewer procedures during a 5-year period. Utilizing several years of data, which has been done in some of the volume-outcome research, may help improve the precision of this indicator.

Minimum bias

Although we located no studies specifically addressing the need for risk adjustment, most of the volume-outcome studies published have used some sort of risk adjustment, suggesting that risk adjustment may be important for this procedure. Most of those studies used administrative data for risk adjustment.

Construct validity

Beyond the volume-outcome relationship we found no evidence for the construct validity of this procedure. Two studies have examined hospital volume as compared to in-hospital mortality rates. Patti et al. 198 used five volume categories, finding decreasing mortality rates of 17%, 19%, 10%, 16%, and 6% (1-5, 6-10, 11-20, 21-30, and >30 procedures during the 5-year study period). Gordon et al. 322 combined all complex gastrointestinal procedures, finding that low volume (11-20 procedures per year) hospitals had an adjusted odds of death of 4.0 as compared to the one high volume hospital.

Fosters true quality improvement

Though we found no evidence on whether or not this indicator would stimulate true improvement in quality, it is possible that high risk patients may be denied surgery.

Prior use

Esophageal resection has not been widely used as a quality indicator.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation20.2%, 36.6% 
   Systematic provider -level standard deviation**2.4%Moderate
   Provider variation as a percentage of total variation**0.8%Moderate
   Signal ratio**8.9%Low
   R-Square**21.0%Low
   **age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustment
   Absolute impact:
     Average absolute change (in %)16.8%Good
   Relative impact:
     Rank correlation0.858Good
     Percent remaining in high decile/low decile66.7% / 100%Good/ V.G.
     Percent changing more than 2 deciles9.5%Good
Precision

This indicator is precise, with a raw provider level mean of 20.2% and a substantial standard deviation of 36.6%. The systematic provider level standard deviation is moderate, at 2.4%. The provider level variation also accounts for a moderate percentage of total variation, at 0.8%. This means that relative to other indicators, a smaller percentage of the variation occurs at the provider level, rather than the discharge level. Finally, the signal ratio is low, at 8.9%. This means that it is very likely that the some of the observed differences in provider performance do not represent true differences in provider performance. Multivariate techniques improve the amount of signal that can be extracted, although the R-square is still low relative to other indicators.

Bias

Signal variance decreases by over 25% with risk adjustment, indicating that some of the true variation among providers reflects differences in patient characteristics. Due to the distribution of this indicator, APR-DRG risk adjustment was not available. Thus, only age and sex risk adjustment was performed. The indicator performs well on the multiple measures of minimum bias. The rank correlation is high at 0.858. Risk adjustment does seem to impact disproportionately at the extreme high end, as there no impact at the low end. The absolute impact of risk adjustment is moderate.

Construct validity

Since the distribution of this indicator violates the assumptions of factor analysis, this indicator was not included in our analysis of construct validity.

Discussion

Esophageal resection is a complex cancer surgery. Several studies have noted that providers with higher volumes have lower mortality rates for the procedure than providers with lower volumes. This suggests that perhaps providers with higher volumes have some characteristics, either structurally or with regard to processes, that influence mortality after this procedure. However, if these characteristics do indeed exist, what they are is unclear.

This indicator has moderate provider systematic variation, and the signal ratio is quite low. This indicates that some observed differences are not true differences in performance. Smoothing is somewhat helpful for this indicator and is recommended, though the amount of extractable signal remains low. As this procedure is performed only by a select number of hospitals, a majority of hospitals will have no cases in a year. The low numbers of these procedures on a provider level may compromise the precision of this indicator. Providers may wish to examine several consecutive years to potentially increase the precision of this indicator.

This indicator generally performed well on our tests of minimum bias, with some moderate bias identified. However, due to the distribution of this indicator only demographic risk adjustment was applied. All studies reviewed in the literature review risk adjusted the mortality rate in some manner, suggesting that risk adjustment is considered important. However, as the aims of the studies were to establish volume-outcome relationships, the extent to which adjusting affected provider performance was not reported. It is recommended that this indicator be risk adjusted.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 8 out of 26, and smoothing is highly recommended. This indicator is recommended with two major caveats of use. First, risk adjustment for clinical factors, is recommended due to the confounding bias for this indicator. Second, little evidence exists supporting the construct validity of this indicator.

INDICATOR 43: HIP REPLACEMENT MORTALITY RATE

IndicatorProvider level mortality rate for hip replacement.
Relationship to QualityBetter processes of care may reduce mortality for hip replacement. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 patients with discharge procedure code of partial or full hip replacement.
Outcome of InterestNumber of deaths with procedure code for hip replacement (see Appendix 6).
Population at RiskAll discharges with procedure code of partial or full hip replacement in any field (see Appendix 6).

Include only discharges with uncomplicated cases: diagnosis or procedure codes for osteoarthrosis of hip in any field (see Appendix 6).

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Total hip arthroplasty (without hip fracture) is an elective procedure performed to improve function and relieve pain among patients with chronic osteoarthritis, rheumatoid arthritis, or other degenerative processes involving the hip joint. Mortality is very low, as it should be for a procedure that is designed to improve function rather than extend survival. However, patients who undergo total hip arthroplasty are often elderly, with multiple comorbidities. As a result, they are at significant risk of such postoperative complications as pneumonia, osteomyelitis, myocardial ischemia, and deep vein thrombosis. If these complications are not recognized and effectively treated, life-threatening problems such as sepsis, myocardial infarction, and pulmonary embolus may ensue. The ICD-9-CM definition of this procedure (81.51) is limited to primary arthroplasties; revisions are assigned a different code (81.53) that does not distinguish hemiarthroplasties from total arthoplasties.

Precision

Primary total hip arthroplasty is one of the most frequent types of major orthopedic surgery; about 160,000 were performed in the USA in 1998 (5.9 per 10,000 persons). 711 Based on state all-payer databases, the mean frequency of primary total hip arthroplasties was 72.8 per hospital in Ontario 821 and 106.9 per hospital in Florida (including total knee arthroplasties) in 1992. 234 However, the in-hospital or 30-day postoperative mortality rate in various studies ranged from 0.10% at New York's Hospital for Special Surgery 822 and 0.15% at Massachusetts General Hospital 823 to 1.97% (0.95% for indications other than hip fracture) in a 5% random sample of Medicare beneficiaries over 65 years of age. 824 The relatively small number of deaths following total hip arthroplasty at each hospital suggests that mortality rates are likely to be unreliable at the hospital level. For example, age and sex standardized postoperative mortality varied 4.8 fold across 10 hospitals in the Oxford region, but this variation was not statistically significant. 825

Minimum bias

Any measure based on an elective procedure, rather than a patient diagnosis, holds the potential for selection bias caused by the decision to elect surgery. Theoretically, one could account for the patient characteristics that are selected upon, but it is unlikely that administrative data are rich enough to do this comprehensively. The known predictors of in-hospital mortality include age, hip fracture, and the presence of any significant comorbidity.716, 821, 824 Failure to adjust for hip fracture has been shown to introduce bias against certain hospitals in the Medicare Hospital Information Project. 826 Indication for surgery (other than hip fracture) and race have not been shown to predict postoperative mortality, whereas there is conflicting evidence on the effect of gender. 825 We are not aware of any studies that tested the effect of physiologic factors (available only from clinical data) on mortality. In the absence of studies explicitly comparing models with and without clinical data elements, it is difficult to assess whether administrative data contain sufficient information to remove bias.

Construct validity

We located no studies explicitly evaluating the construct validity of this indicator. However, there is limited evidence supporting an association between hospital volume and mortality following total hip arthroplasty. (Following Halm, Lee, and Chassin, 80 we did not find this evidence to be sufficiently strong to recommend total hip arthroplasty volume as a separate volume indicator.) Using administrative data from Florida, without any risk-adjustment, Lavernia and Guzman 234 found no association between hospital volume and mortality. However, surgeons with fewer than 10 cases per year showed a significant increase in the death rate, and hospitals with fewer than 10 cases per year showed a significant increase in the complication rate. In a similar analysis of Ontario data, 821 surgeon and hospital volumes were not significantly associated with mortality, postoperative infection,serious complications, or revision. A study of all Medicare claims from 1993 and 1994 showed lower in-hospital and in-hospital plus 30-day mortality after total hip arthroplasty at hospitals with higher Medicare volume for DRG 209 ("major joint and limb reattachment procedures, including primary and revision hip, knee, shoulder, and wrist arthroplasties"). 236 By contrast, another study of Medicare data from 1979 and 1980 showed no association with 60-day mortality, after adjusting for age, sex, hip fracture, and medical school affiliation. 716 Older studies are also inconsistent. Maerki 718 and Luft 239 found a hospital volume effect using 1972 data from the Commission on Professional and Hospital Activities, whereas Farley 230 did not, using 1980-87 data from HCUP. These inconsistencies provide very limited support for the construct validity of mortality as a quality indicator.

More persuasive, perhaps, is mounting evidence that thromboembolic prophylaxis substantially reduces the incidence of symptomatic pulmonary embolism after elective total hip arthroplasty (e.g., 0.16% with warfarin, 0.26% with pneumatic compression, 0.36% with low molecular weight heparin, 1.51% with placebo). 716

Although pulmonary embolism is known to be a major cause of death after hip arthroplasty, 827 there is still no clear evidence that thromboembolic prophylaxis (or any other specific process of care) reduces mortality after total hip arthroplasty. One observational study attributed a decrease in postoperative mortality from 0.36% in 1981-85 to 0.10% in 1987-91 to changes in perioperative care, such as reduced intraoperative blood loss, more aggressive arterial and oximetric monitoring, and increased use of epidural instead of general anesthesia. 822

Fosters true quality improvement

One possible adverse effect of in-hospital mortality measures is to encourage earlier postoperative discharge. We are aware of no data on the likelihood or consequences of premature discharge after primary total hip arthroplasty. Another potential response would be to avoid operating on high-risk patients, although it is unclear to what extent providers could actually recognize and avoid high-risk patients. Indeed, many high-risk patients would actually benefit from being transferred to a more experienced center.

Prior use

In-hospital mortality following total hip arthroplasty is a current indicator in the HCUP I QI set. It is also reported by the HCFA in the Medicare Quality of Care Surveillance System, 828 and used by HealthGrades.com, 377 and the Greater New York Hospital Association. 372 Hip replacement is combined with all hip procedures in an indicator used by the Pennsylvania Health Care Cost Containment Council. 681 Finally, the University Hospital Consortium combines both knee replacement and hip replacement in a mortality indicator. 370

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation1.2%, 5.7% 
   Systematic provider -level standard deviation**0.9%Moderate
   Provider variation as a percentage of total variation**1.2%High
   Signal ratio**20.0%Low
   R-Square**21.6%Low
   **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)48.9%Fair
   Relative impact:
     Rank correlation0.642Fair
     Percent remaining in high decile/low decile27.8% / 70.8%Fair
     Percent changing more than 2 deciles36.1%Fair
Precision

This indicator is adequately precise, with a raw provider level mean of 1.2% and a substantial standard deviation of 5.7%. The systematic provider level standard deviation is moderate, at 0.9%. The provider level variation also accounts for a high percentage of total variation, at 1.2%. This means that relative to other indicators, a high percentage of the variation occurs at the provider level, rather than the discharge level. The signal ratio is low, at 20.0%. This means that it is very likely that some the observed differences in provider performance do not represent true differences in provider performance. The R-square remains low, reflecting the minimal impact of multivariate signal extraction techniques.

Bias

Signal variance decreases by more than 25% with risk adjustment. The indicator performs fairly on the multiple measures of minimum bias. The rank correlation is fair at 0.642. Risk adjustment affects the extreme ends of the distribution substantially. Only 27.8% of providers in the highest decile remain after risk adjustment, and 70.8% in the lowest decile remain. Similarly, the number of providers moving at least two deciles in relative rank is also high. The average absolute change in performance relative to the mean is over 78%.

Construct validity

Hip replacement mortality is not strongly related to other indicators.

Discussion

Hip replacement is an elective surgery with relatively low mortality rates. However, the main recipients of hip replacement are elderly individuals, with increased risk for complications and morbidity from surgery.

The low mortality rate is likely to affect the precision for this indicator. Our empirical analyses confirmed that this indicator is measured with low precision. The signal ratio is especially low. It is likely that some of the precision seen does not reflect true differences in performance. Multivariate techniques do not improve the ability to extract signal for this indicator, but are recommended, as this indicator is very noisy. Nonetheless, this indicator has adequate precision for use as a quality indicator.

As hip replacement is an elective procedure, it is subject to selection bias. Patient characteristics such as age and comorbidities may influence the mortality rate for the procedure (particularly a diagnosis of hip fracture), and bias has been documented for this indicator. Our empirical analyses also identified substantial bias in this indicator, especially for providers at the extremes. This may result in the erroneous labeling of outlier providers. Risk adjustment is highly recommended for this indicator. Given the concerns raised in the literature, providers desiring to use this indicator may want to examine the case mix of their population.

Hospital discharge practices differ, with some hospitals discharging patients earlier than others. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator).

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 3 out of 26, and smoothing is highly recommended. This indicator is recommended with three major caveats of use. First, as hip replacement is an elective procedure, some selection of patient population may create bias. Second, risk adjustment for clinical factors, or at minimum APR-DRGs, is recommended due to the confounding bias for this indicator. Finally, the evidence supporting the construct validity of this indicator is limited.

INDICATOR 44: PANCREATIC RESECTION MORTALITY RATE

IndicatorProvider level mortality rate for pancreatic resection.
Relationship to QualityBetter processes of care may reduce mortality for pancreatic resection. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 patients with discharge procedure code of pancreatic resection.
Outcome of InterestNumber of deaths with procedure code for pancreatic resection in any field (see Appendix 6).
Population at RiskAll discharges with procedure code of pancreatic resection (see Appendix 6) and diagnosis code for cancer in any field.

Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium) and MDC 15 (newborns and other neonates).
Evidence from the literature
Face validity

Pancreatic resection is a complex procedure that requires technical skill. The primary evidence for this indicator arises from the volume-outcome literature. Several studies have found that hospitals that perform more procedures (which sometimes included other complex gastrointestinal procedures) have better mortality rates than lower volume hospitals. The magnitude of this relationship is relatively large as compared to other procedures. A full review of this literature can be found in the discussion of pancreatic resection as a volume indicator. This relationship suggests that there may be some differences in processes of care that result in better outcomes. Those processes have not been identified and are subject to controversy, as it is unclear what the causal relationship is, if there truly is one, between hospital volume and mortality.

Precision

Pancreatic resection is a relatively uncommon procedure, which may impact the precision of the indicator. Glasgow et al. 199 found that in California most hospitals perform 10 or fewer procedures during a 5 year period. However, the mortality rate is high, ranging from 4%-13%.65, 238 Utilizing several years of data, which has been done in some of the volume-outcome research, may help improve the precision of this indicator.

Minimum bias

Although we located no studies specifically addressing the need for risk adjustment, most of the volume-outcome studies published have used some sort of risk adjustment, suggesting that risk adjustment may be important for this procedure. Most of those studies used administrative data for risk adjustment.

Construct validity

Beyond the volume-outcome relationship we found no evidence for the construct validity of this indicator. Ten studies have examined the volume relationship with in-hospital mortality. See the literature review for Pancreatic Resection Volume for details. Glasgow and Mulvihill 199 estimated risk-adjusted, though the risk adjustment was limited, mortality rates of 14%, 10%, 9%, 7%, 8%, and 4% across six hospital volume categories (e.g., 1-5, 6-10, 11-20, 21-30, 31-50, and >50 procedures during the 5-year study period). Gordon et al. 322 estimated that the adjusted odds of death at minimal-volume (<11 "complex gastrointestinal procedures"/year) and low-volume (11-20 procedures/year) hospitals were 12.5 and 10.4 times that at a high-volume hospital (214 procedures/year). However, the generalizability of these results is limited by the fact that the last category included only one hospital.

Lieberman et al. 324 used 1984-91 hospital discharge data from New York State to analyze the association between mortality after pancreatic cancer resection and both physician and hospital volumes. The standardized mortality rate was 19%, 12%, 13%, and 6% at minimal (<10 patients during the 8-year study period), low (10-50 patients), medium (51-80 patients), and high-volume (>80 patients) hospitals, respectively. Surgeon volume was less significantly associated with mortality (6-13% risk-adjusted mortality across 3 volume categories); this effect disappeared in a model that included both physician and hospital volume, as confirmed by Sosa et al. 325 Studies using administrative data from Ontario 326 , the United Kingdom, 327 and Medicare 328 have generated results similar to those from California and New York.

Gordon et al. 331 estimated that 61% of the observed reduction in statewide deaths among patients undergoing the Whipple procedure was attributable to the increasing market share of one facility, from 20.7% to 58.5% between 1984 and 1995.

Fosters true quality improvement

Though we found no evidence on whether or not this indicator would stimulate true improvement in quality, it is possible that high risk patients may be denied surgery.

Prior use

Pancreatic resection has not been widely used as a quality indicator.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation15.4%, 31.3% 
   Systematic provider -level standard deviation**4.2%High
   Provider variation as a percentage of total variation**3.2%High
   Signal ratio**16.5%Low
   R-Square**34.7%Low
   ** age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)41.9%Fair
   Relative impact:
     Rank correlation0.540Fair
     Percent remaining in high decile/low decile71.4% / 28.6%Good / Fair
     Percent changing more than 2 deciles38.1%Fair
Precision

This indicator is moderately precise, with a raw provider level mean of 15.4% and a standard deviation of 31.3%. The systematic provider level standard deviation is high, at 4.2%. The provider level variation also accounts for a high percentage of total variation, at 3.2%. The signal ratio is low, at 16.5%. This means that it is very likely that the some of the observed differences in provider performance do not represent true differences in provider performance. Multivariate extraction techniques do extract additional signal, although the R-square remains low relative to other indicators.

Bias

Signal variance decreases by over 25% with risk adjustment, indicating that some of the true variation among providers reflects differences in patient characteristics. Due to the distribution of this indicator, APR-DRG risk adjustment was not available. Thus, only age and sex risk adjustment was performed. The indicator performs fairly on the multiple measures of minimum bias. The rank correlation is low, indicating substantial impact of risk adjustment on relative performance. Risk adjustment affects the lowest decile disproportionately to the highest decile, with only 28.6% remaining in the lowest decile after risk adjustment, and 71.4% remaining in the highest decile. The absolute impact of risk adjustment is also large.

Construct validity

Since the distribution of this indicator violates the assumptions of factor analysis, this indicator was not included in our analysis of construct validity.

Discussion

Pancreatic resection is a complex cancer surgery. Several studies have noted that providers with higher volumes have lower mortality rates for the procedure than providers with lower volumes. This suggests that perhaps providers with higher volumes have some characteristics, either structurally or with regard to processes, that influence mortality after this procedure. However, if these characteristics do indeed exist, what they are is unclear.

This indicator has high provider systematic variation, though the signal ratio is quite low. This suggests that some of the observed variation does not reflect true differences in performance. Multivariate techniques do improve the amount of signal that can be extracted, and as a result smoothing is recommended. However, the amount of extractable signal remains lower than for other indicators. As this procedure is performed only by a select number of hospitals, a majority of hospitals will have no cases in a year. The low numbers of these procedures on a provider level may compromise the precision of this indicator. Providers may wish to examine several consecutive years to potentially increase the precision of this indicator.

This indicator generally performed poorly on our tests of minimum bias, suggesting that this indicator is subject to substantial bias. In addition, due to the distribution of this indicator, only age-sex risk adjustment was performed. All studies reviewed in the literature review risk adjusted the mortality rate in some manner, suggesting that risk adjustment is considered important. However, as the aims of the studies were to establish volume-outcome relationships, the extent to which adjusting affected provider performance was not reported.

Overall, this indicator is recommended for inclusion in the HCUP II QI set. It received an empirical rating of 5 out of 26, and smoothing is highly recommended. This indicator is recommended with two major caveats of use. First, risk adjustment for clinical factors, is recommended due to the confounding bias for this indicator. Second, little evidence exists supporting the construct validity of this indicator.

INDICATOR 45: PEDIATRIC HEART SURGERY MORTALITY RATE

IndicatorProvider level mortality rate for pediatric heart surgery
Relationship to QualityBetter processes of care may reduce mortality for pediatric heart surgery. As such lower rates represent better quality care.
BenchmarkState, regional, or peer group average.

Method:

Quality MeasureNumber of deaths per 100 patients with discharge procedure code of pediatric heart surgery.
Outcome of InterestNumber of deaths with diagnosis code for pediatric heart surgery (see Appendix 6).
Population at RiskAll discharges, age <18 years, with 1) procedure code of specified pediatric heart surgery in any field or 2) any heart surgery and a diagnosis for hypoplastic left heart syndrome (see Appendix 6).

See Appendix 6 for additional exclusions.
Exclude transfers to other institution.
Exclude MDC 14 (pregnancy, childbirth, and puerperium).
Evidence from the literature
Face validity

Pediatric cardiac surgery requires technical proficiency with the use of complex equipment. Technical errors may lead to clinically significant complications, such as arrhythmias, congestive heart failure, and death. It is thought that postoperative mortality rates vary considerably across hospitals, in a manner that reflects quality of care. Studying provider volume and mortality together would offer a comprehensive perspective on provider performance for pediatric cardiac surgery.

Precision

Previous studies suggest that pediatric cardiac surgery is highly concentrated at a relatively small number of facilities (e.g., 16 hospitals in New York, 37 in California and Massachusetts together). Although some of these facilities have very high volumes, a significant number (e.g., 16 hospitals in California and Massachusetts) perform fewer than 10 cases per year. The highly skewed volume distribution may have an adverse effect on the precision of this measure.

Minimum bias

Pediatric cardiac surgery represents a composite of numerous procedures performed to repair or palliate numerous congenital anomalies. The extreme heterogeneity among these procedures, and among the underlying anomalies, makes bias a serious concern. For example, among procedures with at least 100 cases in New York's Cardiac Surgery Reporting System 194 in 1992-1995, in-hospital mortality varied from 0.4% for repair of atrial septal defect (ASD) to 34.2% for Norwood repair of hypoplastic left ventricle. Even for a single procedure at major centers, such as the Fontan operation for tricuspid atresia or single ventricle, mortality depends heavily on physiological and functional factors, such as asplenia, atrioventricular valvular function, and mean pulmonary artery pressure.829, 830 Technical factors such as the dimension of the native pulmonary arteries may also be important. 831 Because these factors are not available in administrative data sets, and because the most complex patients are likely to be referred to selected centers, unmeasured risk factors could seriously confound inter-provider performance comparisons based on administrative data.

Construct validity

The evidence for the construct validity of this indicator comes from two sources. First, three studies (including one that used prospectively collected clinical data) have reported an association between hospital volume and mortality following pediatric cardiac surgery. Using a multivariate model that included age, complexity category, and four comorbidities, Hannan et al. 194 found 8.26% risk-adjusted mortality at hospitals with fewer than 100 cases per year, versus 5.95% at higher volume hospitals (an effect limited to surgeons who performed at least 75 cases per year). Two other studies using hospital discharge data from California and Massachusetts found similar effects of hospital volume.295, 332 The consistent association between volume and risk-adjusted mortality supports the validity of both measures of performance, and is consistent with the hypothesis that more experience leads to improved technical skills and better outcomes. Other studies from single centers have confirmed this hypothesis by demonstrating improvements in mortality over time for a variety of procedures.832-834

The second source of evidence is that cardiopulmonary bypass or aortic crossclamp time has been repeatedly associated with postoperative mortality, adjusting for a variety of patient characteristics.830, 835-837 This relationship has been demonstrated not just for the Fontan procedure, but also for the Norwood procedure for hypoplastic left heart syndrome. 838 Experienced surgeons and surgical teams should be able to reduce cardiopulmonary bypass or aortic crossclamp time, thereby improving postoperative mortality. It should be noted that patient-level reduction in mortality does not necessarily correspond with provider-level mortality. It is unknown how implementing these processes of care would actually affect provider-level mortality rates.

Fosters true quality improvement

One potential response by physicians to public reporting of procedure mortality rates would be to avoid operating on high-risk patients. Given that the risk factors for adverse outcomes after the more frequent procedures are well known to pediatric cardiac surgeons, and that many of these risk factors are not available from administrative data, avoidance of high-risk cases is a genuine concern. Although such behavior may lead to bias in estimating provider-specific performance, it would be unlikely to worsen population outcomes, because the indications for surgery are generally clear and many high-risk patients would actually benefit from being transferred to a more experienced center.

Another potential response by physicians to reporting in-hospital mortality would be to discharge patients earlier. A recent report 839 suggests that selected patients with a broad spectrum of congenital heart disease may enjoy same-day admission, limited sternotomy, immediate extubation, and very early discharge (25%, 74%, and 82% were discharged, respectively, at <24, <48, <72 hours from admission). It is unclear whether such efforts to reduce length of stay may have unintended negative consequences, such as increased complications and readmissions.

Prior use

Pediatric cardiac surgery mortality has not been widely used as an indicator of quality.

Empirical Evidence
TestStatisticRating
Precision
   Raw provider level rate/standard deviation7.2%, 1.7% 
   Systematic provider -level standard deviation**1.5%Moderate
   Provider variation as a percentage of total variation**0.3%Moderate
   Signal ratio**22.2%Low
   R-Square**37.9%Low
   **APR-DRG, age-, gender- adjusted
Minimum Bias - APR-DRG risk adjustment
   Signal variance change with risk adjustmentDecreasesFair
   Absolute impact:
     Average absolute change (in %)12.8%Good
   Relative impact:
     Rank correlation0.674Fair
     Percent remaining in high decile/low decile16.7% / 66.7%Fair
     Percent changing more than 2 deciles35.1%Fair
Precision

This indicator is adequately precise, with a raw provider level mean of 7.2% and a substantial standard deviation of 1.7%. The systematic provider level standard deviation is moderate, at 1.5%. The provider level variation also accounts for a moderate percentage of total variation, at 0.3%. This means that relative to other indicators, a lower percentage of the variation occurs at the provider level, rather than the discharge level. The signal ratio is low, at 22.2%. This means that it is very likely that some the observed differences in provider performance do not represent true differences in provider performance. The R-square is substantially higher, but remains low relative to other indicators.

Bias

Signal variance decreases by more than 25% with risk adjustment, suggesting that some of the observed variance is due to differences in patient characteristics. The indicator performs fairly on the multiple measures of minimum bias. The rank correlation is fair at 0.674. The impact on the extremes is large, especially in the highest decile. Only 16.7% of providers in the highest decile remain, and only 66.7% in the lowest decile remain, after risk adjustment. Similarly, the number of providers moving at least two deciles in relative rank is also high. However, the absolute magnitude of risk adjustment is moderate.

Construct validity

Pediatric heart surgery does not load substantially on any of the three retained factors.

Discussion

Pediatric heart surgeries include a diverse set of operations ranging from fairly straightforward to rather complex procedures. The mortality for the set of operations has been used as an outcome measure in the volume-outcome literature. Higher volume hospitals have been noted to have lower mortality rates, suggesting some difference in the processes of care between lower and higher volume hospitals, resulting in better outcomes. What those processes are, if they truly exist, is not known.

Relatively few hospitals perform pediatric heart surgery with frequency. This could effect the precision for many hospitals with low volumes. Our empirical tests found this indicator has lower precision relative to most other indicators. The signal ratio is low, indicating that some of the observed differences may not reflect true differences in performance. Multivariate techniques do improve the ability to extract signal for this indicator somewhat, though the proportion that is extractable remains low relative to other indicators. Smoothing is recommended for this indicator to prevent the misidentification of outliers due to random noise.

Given the large variety in operations, and the varying risks associated with each, as well as the rather heterogeneous population receiving heart surgeries, it is likely that risk adjustment will be very important. Many volume-outcome studies used complicated clinical models for risk adjustment. Such adjustment is not available using APR-DRGs. Our empirical tests found substantial bias for this indicator, especially for providers with the highest mortality rates. Further, it is likely that given the complex mixture of procedures included in this definition, that APR-DRG risk adjustment is not adequate and that providers may need to supplement. Providers who wish to use this indicator may consider examining their case mixes, as well as the breakdown in the types of surgeries performed. Medical chart review may be helpful in determining whether more detailed risk adjustment affects hospital performance.

Hospital discharge practices differ, with some hospitals discharging patients earlier than others. For this reason, this indicator should be considered in conjunction with length of stay and transfer rates (though transfers are excluded in this indicator).

As with other mortality measures, there is concern that measuring mortality rates would result in access problems for higher risk patients. If possible, it may be useful to monitor whether operative risk declines with the implementation of this indicator.

Overall, this indicator is recommended for inclusion in the HCUP II QI set, when used in conjunction with volume measures. It received an empirical rating of 3 out of 26, and smoothing is highly recommended. This indicator is recommended with two major caveats of use. First, risk adjustment for clinical factors is recommended due to the substantial confounding bias for this indicator. Second, evidence supporting the construct validity of this indicator is limited.

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...