Why physician performance metrics don’t match up with hospital quality

Judith Garber

2 years ago

As value-based payment models have become more popular, there has been a growing body of research to evaluate whether or not these models achieve their goals of improving care quality and saving money. The results are…mixed. Bundling payments for certain procedures like joint replacement appears to be saving money without reducing quality, and some global spending programs have been successful, especially during Covid-19. But other programs have been criticized for not going far enough to incentivize value over volume. And value-based programs have also been shown to penalize hospitals and clinicians that care for more people of color.

The Merit-based Incentive Payment System (MIPS) in particular has come under increased scrutiny for not achieving its intended goals. MIPS is part of a mandatory value-based payment program that gives clinicians either penalties or bonuses (up to 4% of their Medicare reimbursement) based on their performance. A significant amount of time and money has gone into developing and reporting these quality measures (an estimated 15 hours/week per physician and more than $1 billion by CMS), so there is a lot riding on these metrics being valuable.

Unfortunately, the evidence increasingly shows a disconnect between MIPS results and meaningful outcomes. In a recent study in JAMA Open, researchers at the University of Rochester School of Medicine compared clinicians’ MIPS scores and hospital-level outcomes. They looked at performance on both an individual and hospital level for several specialties, including cardiac surgery, anesthesiology, critical care medicine, general surgery, and orthopedics.

They found no connection between MIPS quality scores and hospitals’ rates of postoperative complications for any specialty. For hospital performance on failure to rescue (the inability to prevent death after a complication), only anesthesiologists showed a pattern between low MIPS performance and greater risk of FTR. Another exception to the rule was in cardiac surgery; the researchers found that low MIPS scores for cardiac surgeons were associated with higher hospital-level rates of mortality and readmissions after coronary artery bypass grafting.

Why don’t physician performance scores match more closely to hospital-level outcomes? The authors point to several issues within the MIPS program that could lead to this discrepancy:

First, although physicians are required to report on 6 quality measures, they may select any 6 measures from the list of 271 available MIPS measures. Unlike Hospital Compare, in which hospital performance is rated using a standard set of uniform metrics, such as mortality and readmissions, physician performance in MIPS is measured using a composite score based on self-selected metrics that vary between physicians. Second, physicians are free to report the measures on which they perform best, rather than those that may best reflect their overall quality of care.Third, of these 6 measures, only 1 is required to be an outcome measure, while the others can be process measures. Process measures only reflect quality of care if they are anchored in best practices that lead to better outcomes.

In an accompanying editorial, Dr. Richard Dutton, Chief Quality Officer for US Anesthesia Partners explains more of the complexities behind MIPS measures. While physicians report performance individually, in reality health care is a team effort, and it’s not always clear whose “fault” it is for a bad outcome. Because there is money on the line, physicians less likely to accept blame for a bad outcome if they can help it. “Applying financial incentives to the process motives participation but also inspires gamesmanship at multiple levels,” writes Dutton.

“As presently constructed, MIPS does little but contribute to the 34% of US health care dollars spent on administrative activities, with only marginal gains in quality improvement.”
Dr. Richard Dutton, JAMA Open

Additionally, Dutton notes that many of the MIPS metrics are designed to make physicians look good, because the metrics are developed by physician groups themselves. The process of developing MIPS metrics takes so much time and money that only groups that stand to gain something want to go through the process. “The result is a set of measures that might be reassuring to the public because performance is uniformly high but do nothing to demonstrate variations in care that might enable quality improvement,” Dutton writes.

So what to do about MIPS? Are there changes CMS can make to the program to make it more meaningful? Dutton notes some “glimmers of hope” in MIPS, such as the uptake of more evidence-based practices. However, there is a long way to go. “As presently constructed, MIPS does little but contribute to the 34% of US health care dollars spent on administrative activities, with only marginal gains in quality improvement,” Dutton writes. He suggests that clinical registries fueled by interoperable medical records would be more useful for improving outcomes and transparency than the current patchwork system physicians are struggling with every day.