Automating Repetitive Tasks for Efficiency: For Loops

It’s incredibly useful to be able to automate an analysis or set of analyses that you want to perform multiple times in exactly the same way. For example, if you’re working in industry, you might want to perform analyses that allow you to draw separate conclusions about the performance of individual stores, regions, products, customers, or employees. If you’re working in academia, you might want to separately examine multiple, different dependent variables. Frequently, this may entail several distinct steps, such as subsetting the data, performing the analysis or set of analyses, generating well-labeled output, etc.

This post presents one approach for feeding R a list of units to loop through, and then iteratively performing the same set of tasks for each unit. This particular example uses time series data (organized “long” or in “person period” format), and I pretend this data refers to sales data for 20 separate stores assessed during the first 6 weeks of 2015.

The code sample begins by generating fake data. After this:

  1. The first task is to provide R with a list of whatever you want it to loop through (e.g., variables, participants, employees, stores, etc.). You can individually type these out or you can tell R to go pluck all unique values of the thing you’re looping through, as I do here.
  2. The next step is to create an index variable (i), which iteratively progresses through the list you provided R in step 1.
  3. You then create a temporary variable (updated as i progresses through the list created in step 1), which sets the value of whatever you’re looping through for this iteration of the loop.  In this example, this variable is “store” and holds the unique identifier for each store used in the data frame.
  4. Using “store,” we can now create generic code that will work for each of the stores in our list. The code below subsets the data by store, generates a unique graph title for each store, and graphs each store’s data in a separate graph.
# Generate fake time-series data, with 20 units (could be participants, stores, customers, etc.) assessed at 6 timepoints:
ID <- LETTERS[1:20] # a unique identifier (numeric or non-numeric) for each unit
date <- c("2015-01-05","2015-01-12","2015-01-19","2015-01-26","2015-02-02","2015-02-09")
date <- as.Date(date)
ID.dates <- merge(as.data.frame(ID), as.data.frame(date), all=TRUE)
variable1 <- rnorm(120, mean=10000, sd=1000)
df <- cbind(ID.dates,variable1)
df <- as.data.frame(df)
remove(ID)
remove(date)
remove(ID.dates)
remove(variable1)

# Assuming you will be using this code on real data, identify the units to loop through.
# In this example, I pretend that the units referred to with the "ID" variable are stores.
stores <- unique(df$ID)

# The goal is to automate some task that you wish to do separately for each of the units in your dataset.
# In this example, I generate separate graphs for each "store," including customized graph titles.

for (i in 1:length(stores) ) {
    
    store <- stores[i]
    individual.store <- subset(df, ID==store)
    
    graph.title <- paste("Sales for Store", store, sep=" ")

    library(ggplot2)
    library(scales)
    sales.graph <- ggplot(individual.store, aes(x=individual.store$date, y=individual.store$variable1)) +
        geom_point(shape=20) +    # Use filled circles
        geom_line() +             # add a line connecting the dots
        scale_y_continuous(labels = dollar) +    # the dependent variable as currency
        theme(legend.title = element_text(face="bold")) +
        ylab("Sales") + xlab("Week") + ggtitle(graph.title)
    print(sales.graph)
    
    print(i) # this will let you know where the code is in the loop
    
}

remove(i)
remove(store)
remove(graph.title)
remove(individual.store)
remove(sales.graph)

remove(stores)
remove(df)