I frequently work with private data. Sometimes, it lives on my personal machine rather than on a database server. Sometimes, even if it lives on a remote database server, it is better that I use locally cached data than query the database each time I want to do analysis on the data set. I have always dealt with this by creating encrypted disk images with secure passwords (stored in 1Password). This is a nice extra layer of protection for private data served on a laptop, and it adds little complication to my workflow. I just have to remember to mount and unmount the disk images.
However, it can be inconvenient from a project perspective to refer to data in a distant location like
/Volumes/ClientData/Entity/facttable.csv. In most cases, I would prefer the data “reside” in
cache/ “inside” of my project directory.
Luckily, there is a great way that allows me to point to
data/facttable.csv in my R code without actually having
facttable.csv reside there: symlinking.
A symlink is a symbolic link file that sits in the preferred location and references the file path to the actual file. This way, when I refer to
data/facttable.csv the file system knows to direct all of that activity to the actual file in
From the command line, a symlink can be generated with a simple command:
ln -s target_path link_path
R offers a function that does the same thing:
link_path are both strings surrounded by quotation marks.
One of the first things I do when setting up a new analysis is add common data storage file extensions like
.xls to my
.gitignore file so that I do not mistakenly put any data in a remote repository. The second thing I do is set up symlinks to the mount location of the encrypted data.
A brief discussion on the complexity of determining the number of schools a student has attended within a single school year using a minimal set of information.
One of the most challenging aspects of being a data analyst is translating programmatic terms like “student mobility” into precise business rules. Almost any simple statistic involves a series of decisions that are often opaque to the ultimate users of that statistic.
Documentation of business rules is a critical aspect ...
My analysis on Nesi’s Notes depended entirely on the National Center for Education Statistics’ Common Core Data. The per pupil amounts reported to NCES may look a bit different from state sources of this information. There are several explanations of this. First, the enrollment counts used to generate per ...