News on Assertions in R

2015-07-05

How many times have you written R functions that start with a bunch of code that looks like this?

my_funct <- function(dob, enddate = "2015-07-05"){
if (!inherits(dob, "Date") | !inherits(enddate, "Date")){
    stop("Both dob and enddate must be Date class objects")
  } 
...
}

Because R was designed to be interactive, it is incredibly tolerant to bad user input. Functions are not type safe, meaning function arguments do not have to conform to specified data types. But most of my R code is not run interactively. I have to trust my code to run on servers on schedules or on demand as a part of production systems. So I find myself frequently writing code like the above— manually writing type checks for safety.

There has been some great action in the R community around assertive programming, as you can see in the link. My favorite development, by far, are type-safe functions in the ensurer package. The above function definition can now be written like this:

my_funct <- function_(dob ~ Date, enddate ~ Date: as.Date("2015-07-05"), {
  ...
})

All the type-checking is done.

I really like the reuse of the formula notation ~ and the use of : to indicate default values.

Along with packages like testthat, R is really growing up and modernizing.

This entry was tagged as rstats

Jan 23, 2015

A PostgreSQL Cheat Sheet for OSX and R

I keep this on my desktop.

Install:

brew install postgresql
initdb /usr/local/var/postgres -E utf8
gem install lunchy
### Start postgres with lunchy
mkdir -p ~/Library/LaunchAgents
cp /usr/local/Cellar/postgresql/9.3.3/homebrew.mxcl.postgresql.plist ~/Library/LaunchAgents/

Setup DB from SQL file:

### Setup DB
lunchy ...

Apr 02, 2014

Symlinking Your Data

I frequently work with private data. Sometimes, it lives on my personal machine rather than on a database server. Sometimes, even if it lives on a remote database server, it is better that I use locally cached data than query the database each time I want to do analysis on ...

Mar 10, 2014

Expressiveness Counts

Education data often come in annual snapshots. Each year, students are able to identify anew, and while student identification numbers may stay the same, names, race, and gender can often change. Sometimes, even data that probably should not change, like a date of birth, is altered at some point. While ...

Feb 18, 2014

Appreciating the Beauty of dplyr

Hadley Wickham has once again1 made R ridiculously better. Not only is dplyr incredibly fast, but the new syntax allows for some really complex operations to be expressed in a ridiculously beautiful way.

Consider a data set, course, with a student identifier, sid, a course identifier, courseno, a quarter ...