Crime maps interest just about everyone. Government officials are interested in the need for and success of intervention programs, law enforcement officials are interested in policing needs, and private citizens are concerned about their safety and the safety of loved ones. This post presents code to create an interactive Shiny application that will allow the user to specify an address, the type of crime, and the time of day - or not, and instead just zoom around as their curiosity dictates - and see mapped crime incidents with dynamically adjusting annual crime stats in that specific area.
The data for this post come from the Baton Rouge, Louisiana Crime Incidents dataset. Read More
Customer segmentation is a deceptively simple-sounding concept. Broadly speaking, the goal is to divide customers into groups that share certain characteristics. There are an almost-infinite number of characteristics upon which you could divide customers, however, and the optimal characteristics and analytic approach vary depending upon the business objective. This means that there is no single, correct way to perform customer segmentation.
In this post, I work through a practical example that, in my experience, closely mirrors the challenges of performing this kind of analysis with real data. Read More
Shiny by RStudio is a really lovely, interactive way to present analyses to users. Conveniently, there's a free and open-source community-version of the package. This post introduces code to create a complete, interactive Shiny app with employment data and forecasts for New Orleans, LA. Read More
Rendering graphics typically takes R some time, so if you’re going to be producing a large number of similar graphics, it makes sense to leverage R's parallel processing capabilities. However, if you’re looking to collect and return the graphics together in a sorted object – as we were in the previous post on animated choropleths – there’s a catch. R has to keep the whole object in random access memory (RAM) during parallel processing. As the number of graphics files increases, you risk exceeding the available RAM, which will cause parallel processing to slow dramatically (or crash). In contrast, a good, old-fashioned sequential for loop can write updates to the object to the global environment after each iteration, clearing RAM for the next iteration. Paradoxically, then, parallel processing can take longer than sequential processing in this situation. In the case of the animated choropleths in the previous post, parallel processing took 21 minutes, whereas sequential processing took 11 minutes.
This post presents code to combine the efficiency and speed of parallel processing with the RAM-clearing benefits of sequential processing when generating graphics. Read More
This post demonstrates how to map change in a variable over time in a geographic area, allowing the user to scroll through time and selectively view dates of interest. It produces an interactive choropleth map, as the last post did, but whereas the last post was interactive in the sense that the user could zoom in on a specific geographic area, this map is interactive in the sense that the user can ‘zoom in’ on a specific point in time.
This map: In the wake of Hurricane Katrina, multiple New Orleans committees generated plans to rebuild the city; in some cases, these plans involved shifting the city’s footprint to move citizens out of more topographically vulnerable areas. The sequence of maps produced here answer the question: how quickly did various New Orleans zip codes re-populate after Hurricane Katrina, and how does the city’s current address density relate to pre-Katrina levels? Read More
This post produces an interactive map that features analysis output mapped onto the geographical region to which it applies. This specific map answers the question: which areas in and around New Orleans exhibited the most growth (or loss) in active addresses over the past two years? Read More
It’s incredibly useful to be able to automate an analysis or set of analyses that you want to perform multiple times in exactly the same way. For example, if you’re working in industry, you might want to perform analyses that allow you to draw separate conclusions about the performance of individual stores, regions, products, customers, or employees. If you’re working in academia, you might want to separately examine multiple, different dependent variables. Frequently, this may entail several distinct steps, such as subsetting the data, performing the analysis or set of analyses, generating well-labeled output, etc.
This post presents one approach for feeding R a list of units to loop through, and then iteratively performing the same set of tasks for each unit. Read More
This post presents code to give the user a quick overview of a numeric variable with one function call. The code, which can easily be modified for your specific needs, currently includes information about the amount of missing data, mean and standard deviation (applicable when the distribution is normally distributed), median score and deciles, unique values of the variable, and the shape of the distribution. Read More