Home > data science, open source tools > Step 0 Revisited: Doing it in R

Step 0 Revisited: Doing it in R

June 25, 2011

A nerd friend of mine kindly rewrote my python scripts in R and produced similar looking graphs.  I downloaded R from here and one thing that’s cool is that once it’s installed, if you open an R source code (ending with “.R”), an R console pops up automatically and you can just start working.  Here’s the code:

gdata <- read.csv('large_data_glucose.csv', header=TRUE)
#We can open a spreadsheet type editor to check out and edit the data:
edit(gdata)
#Since we are interested in the glucose sensor data, column 31, but the name is a bit awkward to deal with, a good thing to do is to change it:
colnames(gdata)[31] <- "GSensor"

#Lets plot the glucose sensor data:
plot(gdata$GSensor, col="darkblue")

#Here's a histogram plot:
hist(gdata$GSensor, breaks=100, col="darkblue")
#and now lets plot the logarithm of the data:
hist(log(gdata$GSensor), breaks=100, col="darkblue")

And here are the plots:

Sensor_Glucose_plot

Sensor_Glucose_histogram

Log_Sensor_Glucose_histogram

One thing my friend mentions is that R automatically skips missing values (whereas we had to deal with them directly in python).  He also mentions that other things can be done in this situation, and to learn more we should check out this site.

R seems to be really good at this kind of thing, that is to say doing the first thing you can think about with data.  I am wondering how it compares to python when you have to really start cleaning and processing the data before plotting.  We shall see!


						
%d bloggers like this: