Multilevel Models to Explore the Impact of the Affordable Care Act’s Shared Savings Program, Part II

This post is the technical accompaniment to Multilevel Models to Explore the Impact of the Affordable Care Act’s Shared Savings Program, Part I.

Aggregated Financial Impact by Year

To create an initial overview of the program’s financial impact, I used all of the publicly available results data for the Shared Savings Program. If you caught my Data Wrangling: Combining Slightly Different Datasets post, in which I imported and cleaned the data for this analysis, you might recall that there were a few ACOs for whom I wasn’t able to find a unique identifier to track them across years. I kept these observations in the dataset, however, and they were included in this aggregated look at the financial impact of the Shared Savings Program by year.

To aggregate by year and whether the ACO received a payout for being under their benchmark, I used R’s dplyr package to generate the equivalent of an Excel Pivot Table. This produces a table that can easily be outputted to MS Excel for further manipulation.

library(dplyr)
library(scales)

# Table 1
table.1 <- data %>%
group_by(year, payout) %>%
summarise(Number_of_ACOs = n(),
Savings_relative_to_benchmark = sum(savings, na.rm=TRUE),
Total_Payouts = sum(earned_shared_savings_payments_owe_losses3_4, na.rm=TRUE),
Total_Savings = Savings_relative_to_benchmark - Total_Payouts)
print("Table 1")
table.1
cat(rep("\n",3))

library(WriteXLS) # Output table to Excel
testPerl(perl = "perl", verbose = TRUE)
WriteXLS(c("table.1"),
 ExcelFileName = "SSP_Financial_Savings.xlsx",
 SheetNames = c("SSP Financial Savings over Time"),
 perl = "perl",
 verbose = FALSE,
 row.names = FALSE, col.names = TRUE,
 AdjWidth = TRUE, BoldHeaderRow = TRUE,
 FreezeRow = 1, FreezeCol = 1)
remove(table.1)

 

Change in Savings Rate over Time

To explore whether ACOs increased their savings rates in the program over time, we need to estimate the relationship between year-in-the-program and savings rate. I'm working with savings rate (or a percentage reflecting the amount the ACO was under or over benchmark, divided by their benchmark) rather than the absolute dollar value saved because the ACOs varied in size and I'm looking to use a metric that is uniform across differently sized ACOs. Please note that the saving rate variable I'm working with here doesn't factor in any payouts the ACO earned - we've already seen that with the payouts, the Shared Savings Program is in the red. With these analyses, I'm looking to answer a different question and understand whether the ACOs change over time in the program.

The first thing I do here is rescale some of the predictors, so the intercepts in the models are more interpretable. The scaling I’ve selected below means that the intercept will reflect the model-implied savings rate for ACOs in their first year of the program and with the average number of assigned patients. (Medicare calls the assigned patients “beneficiaries”, and I kept that variable name so it would be easier to map this work back to the raw data.)

# Scale predictors so intercepts are meaningful in the models
data$year_in_program_scaled <- data$year_in_program - 1
data$beneficiaries_scaled <- scale(data$total_assigned_beneficiaries, center = TRUE, scale = FALSE)

I then check to make sure that the dependent variable is reasonably normally distributed, which it is. Technically, it's the model residuals that need to be normally distributed for the kind of model we're going to estimate, but normally distributed residuals are more likely with a normally distributed dependent variable.

# confirm that the dependent variable is normally-distributed
library(ggplot2)
options(scipen=999)
ggplot(data, aes_string(x="savings.rate")) +
geom_histogram() + scale_x_continuous(labels=percent) +
theme(plot.title = element_text(hjust = 0.5)) + # center plot title
ggtitle("savings.rate Frequency Distribution")

 

We can see from the graph that although most observations fall between -20% and +20% or so, there are two outliers at the low end. Below, I investigate these observations a little more closely.

# one ACO had a savings.rate of -40% in their first year and then improved; another had a savings.rate of -30% in their second year
delete <- subset(data, savings.rate < -0.20)
delete1 <- subset(data, 
                  aco_num == delete[1,1], 
                  select=c("aco_name","year","year_in_program","total_benchmark_expenditures","total_expenditures",
                           "savings","savings.rate"))
delete2 <- subset(data, 
                  aco_num == delete[2,1], 
                  select=c("aco_name","year","year_in_program","total_benchmark_expenditures","total_expenditures",
                           "savings","savings.rate"))
remove(delete, delete1, delete2)

 

I'm particularly concerned about the ACO with a savings rate of -40% in their first year, who then improved dramatically. This ACO has the potential to be a very influential case in estimating the impact of time on savings rate. The ACO with a savings rate of -30% in their second year is probably less of a concern, but to prevent these two observations from obfuscating what's happening in the rest of the data, I omit them.

mlm.data <- subset(data, savings.rate > -0.20)

 

We’re now ready to fit a model estimating the impact of year-in-the-program on savings rate. One way to do this is by simply fitting a regression to the whole sample, and ignoring the fact that each ACO may represent up to three observations in the data (if they started the program in 2013). The problem with this approach is that regression assumes that each observation is independent and that the resulting residuals from the model are uncorrelated. If we have three observations for the same ACO, those three observations are probably more similar to each other than they are to other observations in the data. The test for the impact of time on savings rate won’t account for this, though, and will likely inflate significance. This issue is called “nesting,” and it can be addressed by using a modeling approach that explicitly separates the between-group variability (in this case, the variability between ACOs) from the within-group variability (the variability within the ACO over observations).  

A quick way to determine whether you have significant nesting in your data is to calculate the intraclass correlation, or ratio of between–group variability to total variability. Values close to 0 indicate that differences between groups (in this case, ACOs) don’t account for much of the total variance; values close to 1 suggest they do. In the code sample below, I run an “intercept only” model, with no predictors, and use a code snippet I’ve borrowed from Kenny and Holt (2013; click here for the paper) to estimate the intraclass correlation. The ICC function returns a value of 0.60, suggesting that there’s quite a bit of nesting in the Shared Savings Program data and we’ll definitely want to use multilevel modeling.

library(nlme)
model.0 <- lme(savings.rate ~ 1, # intercept-only model
 data=mlm.data,
 random = ~ 1|aco_num, # random intercept
 na.action="na.omit")

# Function to calculate the interclass correlation
ICC.lme <- function(out) {
varests <- as.numeric(VarCorr(out)[1:2])
return(paste("ICC =", varests[1]/sum(varests))) 
}

ICC.lme(model.0)

 

When I imported the data and prepared it for analysis, I spent a good bit of time trying to identify the same ACO in different years of the study. (Originally, the Shared Savings Program results data only identified ACOs by their names, and sometimes their names were entered differently in different years of the study.) Here, you can see why having a way to identify the same ACO over time was important - we need that identifier ("aco_num") to correctly parse the between- and within-group variance. Note that observations that lack an aco_num are excluded from the analysis.

The two most commonly-used R packages for multilevel modeling are nlme and lme4. I prefer nlme because its lme() function includes the p-values in the output and it can incorporate correlation structure (such as a temporal correlation if we think assessments taken closer together in time will be more related to each other than those taken farther apart in time).

Below, I use the nlme package to estimate a series of models. In the first, I allow the savings rates in the first year of the study (or, the intercept) to vary by ACO, but I assume a constant effect of time on the savings rate. In the second, I allow both the intercept and the effect of time to vary across ACOs, then I use an anova to test the change in model fit between models. This significant test suggests that we do want to allow the effect of time to vary across ACOs.

# year_in_program: random intercept model
model.1 <- lme(savings.rate ~ 1 + year_in_program_scaled,
 data=mlm.data,
 random = ~ 1|aco_num, # random intercept
 na.action="na.omit")
summary(model.1)

# year_in_program: random intercept model and random effect of year_in_program
model.2 <- lme(savings.rate ~ 1 + year_in_program_scaled,
 data=mlm.data,
 random = ~ year_in_program_scaled|aco_num, # random intercept AND random slope
 na.action="na.omit")
summary(model.2)

# test of impact of random slope; fixed effects must be the same in the models
anova(model.1, model.2)
model.2 results showing significant, positive effect of year_in_program on savings.rate

model.2 results showing significant, positive effect of year_in_program on savings.rate

Lastly, I run a third model that includes the number of patients (or beneficiaries) assigned to the ACO in each year as a time-varying covariate. This predictor isn’t significant, but including it confirms that the results aren’t just a function of changing the number of beneficiaries in the program.

# Add number of assigned beneficiaries
model.3 <- lme(savings.rate ~ 1 + year_in_program_scaled + beneficiaries_scaled, 
 data=mlm.data, 
 random = ~ year_in_program_scaled|aco_num, # random intercept AND random slope
 na.action="na.omit")
summary(model.3)

The results indicate that ACOs typically exhibit a savings rate of about 0.3% (technically, 0.2677731%) in their first year of the program, and that this rate rises an additional 0.4% (technically, 0.4127823%) with each year in the program. This effect is statistically significant. The number of beneficiaries assigned to an ACO doesn't significantly predict the ACO's savings rate.

We can graph each ACO's individual savings rate over time, together with results for the sample as a whole, as follows. 

graph.data <- subset(mlm.data, ! is.na(mlm.data$aco_num))
full.plot <- ggplot() +
# Graph separate regression lines for each ACO
geom_smooth(data=graph.data, 
aes(x = year_in_program, y = savings.rate, group = aco_num),
method='lm', se=FALSE, colour="darkgrey", size = 0.2) +
# Graph model-implied values for sample as a whole
geom_smooth(data=graph.data, 
aes(x = year_in_program, y = savings.rate),
method='lm', se=FALSE, colour="black", size = 0.5) +
# Plot formatting
theme(plot.title = element_text(hjust = 0.5)) + # center plot title
scale_y_continuous(labels=percent) +
scale_x_continuous(breaks=c(1, 2, 3)) +
xlab("Year in Shared Savings Program") +
ylab("Savings Rate") +
ggtitle("Savings Rate by Year-in-the-Program\n(for each ACO (grey lines) and sample as a whole (black line))") 
full.plot

zoom.plot <- full.plot + coord_cartesian(ylim=c(0, 0.015))
zoom.plot

remove(full.plot, zoom.plot)
full.plot - full range of y-axis

full.plot - full range of y-axis

zoom.plot - zoomed in on 0-1.5% on y-axis

zoom.plot - zoomed in on 0-1.5% on y-axis

Change in Quality of Care over Time

Change in quality of care over time was examined using separate a separate multilevel model for each of 25 quality-of care-measures. There were 33 quality-of-care measures administered in each year, but 8 of the measures changed in 2015, such that only 25 are available for 2013, 2014, and 2015. Multilevel modeling works best when there’s the possibility for at least three observations nested within level 2 unit, even if there’s missing data, so I restrict the quality-of-care analyses to these 25 measures. When I downloaded and prepared the data for analysis, I confirmed that for each quality-of-care measure, the variable ranges and quartile values in each year were similar, suggesting that they’d been measured and scored using the same logic.

Below, I use a for loop to cycle through all 25 quality-of-care measures and check their distributions. The distributions suggest that most of the variables are normally distributed, but five (i.e., aco_11, aco_17, aco_27, aco_30, and aco_31) will probably need to be transformed.

# QOC distributions
for (i in 1:length(qoc.items)) {

column <- qoc.items[i]
print(noquote(column))

# subset to remove NA values before histogram
hist.data <- data[!is.na(data[,column]),]

# obtain recommended binwidth
recommended.binwidth <- diff(range((hist.data[,column]))/30)

library(ggplot2)
title <- paste(column, "Frequency Distribution")
hist.graph <- ggplot(hist.data, aes_string(x=column)) + geom_histogram(binwidth=recommended.binwidth) + ggtitle(title)
print(hist.graph)
remove(hist.data, title, recommended.binwidth, hist.graph)

library(e1071)
print("Skewness = ")
print(skewness(data[,column]), na.rm=TRUE)

cat(rep("\n",1))

remove(column)

}

remove(i)

# Transform skewed variables
data$aco_11_squared <- (data$aco_11)^2
data$aco_17_squared <- (data$aco_17)^2
data$aco_27_log <- log(data$aco_27)
data$aco_30_squared <- (data$aco_30)^2
data$aco_31_squared <- (data$aco_31)^2

qoc.items[[11]] <- "aco_11_squared"
qoc.items[[16]] <- "aco_17_squared"
qoc.items[[21]] <- "aco_27_log"
qoc.items[[23]] <- "aco_30_squared"
qoc.items[[24]] <- "aco_31_squared"
qoc.items

 

We’ll now check the intraclass correlations.

ICC.lme <- function(out) {
varests <- as.numeric(VarCorr(out)[1:2])
return(paste("ICC =", varests[1]/sum(varests)))
}

library(nlme)

for (i in 1:length(qoc.items)) {

column <- qoc.items[i]
print(noquote(column))

f <- as.formula(paste(column, "~ 1"))
print(f)

model.0 <- lme(f, # fixed effects formula
 data=data, 
 random = ~ 1|aco_num, # random intercept
 na.action="na.omit")
print(ICC.lme(model.0))
remove(column, f, model.0)

cat(rep("\n",1))

}

remove(i, ICC.lme)

 

The ICC values range from 0.22 – 0.75, and suggest that we’ll want to use multilevel modeling for all of the quality-of-care models. This doesn’t let us know whether we want a random intercept-only model or a model with random intercept and random slope for each separate quality-of-care measure, though. To handle this, I set up another for loop that estimates both models, tests the difference, and returns the results from the first model if estimating the random slope doesn’t significantly improve model fit and returns the results from the second if it does.

I’ve also included a tryCatch({ }) command in the for loop below. tryCatch is fantastic. If there’s an error on any given iteration of the loop, it allows the for loop to jump to the next iteration in the loop, rather than breaking and discontinuing the loop. The addition of the random slopes create some convergence issues for five models. tryCatch means that we’ll get results for all the others, and an error message for those that broke so we can go back and handle those separately.

for (i in 1:length(qoc.items)) {

tryCatch({

column <- qoc.items[i]
print(column)
 

# customize formula for each QOC item
f <- as.formula(paste(column, "~ 1 + year_in_program_scaled"))
print(f)

model.1 <- lme(f, # fixed effects formula
 data=data,
 random = ~ 1|aco_num, # random intercept
 na.action="na.omit")

model.2 <- lme(f, # fixed effects formula
 data=data,
 random = ~ year_in_program_scaled|aco_num, # random intercept and random slope
 na.action="na.omit")

# test of impact of random slope; fixed effects must be the same in the models
anova.test <- anova(model.1, model.2)

# if the random slope is significant, print the results from that model;
# if not, print results from random intercept model.
if (anova.test$p[2] < 0.05) {
print("model.2 Results")
print(summary(model.2))
} else {
print("model.1 Results")
print(summary(model.1))
}

cat(rep("\n",3))
remove(column, f, model.1, model.2, anova.test)

}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})

}

remove(i)


We received either model.1 or model.2 results (labeled as such) for all of the quality-of-care measures except for aco_4, aco_6, aco_21, aco_28, and aco_31. In each of these cases, the problem was a convergence issue when we estimated a random slope. Below, I quickly re-run the random intercept-only models for these five quality-of-care measures.

# Run models for remaining items
qoc.items <- c("aco_4", "aco_6", "aco_21", "aco_28", "aco_31")

for (i in 1:length(qoc.items)) {

column <- qoc.items[i]
print(column)

# customize formula for each QOC item
f <- as.formula(paste(column, "~ 1 + year_in_program_scaled"))
print(f)

model.1 <- lme(f, # fixed effects formula
 data=data, 
 random = ~ 1|aco_num, # random intercept
 na.action="na.omit")
print(summary(model.1)) 

cat(rep("\n",3))
remove(column, f, model.1)

}

remove(i)

We now have results for each quality-of-care measure. In Part I, I included a table summarizing the results from all of the models. The results suggest that ACOs in the Shared Savings Program do improve on most of the quality-of-care measures over time.

Complete code for this post can be found on GitHub at: https://github.com/kc001/Blog_code/blob/master/2017.01%20Multilevel%20modeling.R