Parallel Processing for Memory-Intensive Maps and Graphics

Rendering graphics typically takes R some time, so if you’re going to be producing a large number of similar graphics, it makes sense to leverage R's parallel processing capabilities. However, if you’re looking to collect and return the graphics together in a sorted object – as we were in the previous post on animated choropleths – there’s a catch. R has to keep the whole object in random access memory (RAM) during parallel processing. As the number of graphics files increases, you risk exceeding the available RAM, which will cause parallel processing to slow dramatically (or crash). In contrast, a good, old-fashioned sequential for loop can write updates to the object to the global environment after each iteration, clearing RAM for the next iteration. Paradoxically, then, parallel processing can take longer than sequential processing in this situation. In the case of the animated choropleths in the previous post, parallel processing took 21 minutes, whereas sequential processing took 11 minutes.

This post presents code to combine the efficiency and speed of parallel processing with the RAM-clearing benefits of sequential processing when generating graphics.

Read More

Automating Repetitive Tasks for Efficiency: For Loops

It’s incredibly useful to be able to automate an analysis or set of analyses that you want to perform multiple times in exactly the same way. For example, if you’re working in industry, you might want to perform analyses that allow you to draw separate conclusions about the performance of individual stores, regions, products, customers, or employees. If you’re working in academia, you might want to separately examine multiple, different dependent variables. Frequently, this may entail several distinct steps, such as subsetting the data, performing the analysis or set of analyses, generating well-labeled output, etc.

This post presents one approach for feeding R a list of units to loop through, and then iteratively performing the same set of tasks for each unit.

Read More

Descriptive statistics – Numeric variable

This post presents code to give the user a quick overview of a numeric variable with one function call. The code, which can easily be modified for your specific needs, currently includes information about the amount of missing data, mean and standard deviation (applicable when the distribution is normally distributed), median score and deciles, unique values of the variable, and the shape of the distribution.

Read More