Home > data science, open source tools > Step 0: Installing python and visualizing data

Step 0: Installing python and visualizing data

June 22, 2011

A friend of mine has type I diabetes, and lots of data (glucose levels every five minutes) from his monitor.  We’ve talked on and off about how to model future (as in one hour hence) glucose levels, using information on the current level, insulin intake, and carb intake.  He was kind enough to allow me to work on this project on this blog.  It’s an exciting and potentially really useful project, and it will be great to use as an example for each step of the modeling process.

To be clear:  I don’t know if I will be able to successfully model glucose levels (or even better be able to make suggestions for how much insulin or carbs to take in order to keep glucose levels within reasonable levels), but it’s exciting to try and it’s totally worth a try.  I’m counting on you to give me suggestions if I’m being dumb and missing something!

I decided to use python to do my modeling, and I went to this awesomely useful page and followed the instructions to install python and matplotlib on my oldish mac book. It worked perfectly (thanks, nerd who wrote that page!).

The data file, which contains 3 months of data, is a csv (comma separated values) file, with the first line describing the name of the values in the lines below it:

Index,Date,Time,Timestamp,New Device Time,BG Reading (mg/dL),Linked BG Meter ID,Temp Basal Amount (U/h),Temp Basal Type,Temp Basal Duration (hh:mm:ss),Bolus Type,Bolus Volume Selected (U),Bolus Volume Delive\
red (U),Programmed Bolus Duration (hh:mm:ss),Prime Type,Prime Volume Delivered (U),Suspend,Rewind,BWZ Estimate (U),BWZ Target High BG (mg/dL),BWZ Target Low BG (mg/dL),BWZ Carb Ratio (grams),BWZ Insulin Sens\
itivity (mg/dL),BWZ Carb Input (grams),BWZ BG Input (mg/dL),BWZ Correction Estimate (U),BWZ Food Estimate (U),BWZ Active Insulin (U),Alarm,Sensor Calibration BG (mg/dL),Sensor Glucose (mg/dL),ISIG Value,Dail\
y Insulin Total (U),Raw-Type,Raw-Values,Raw-ID,Raw-Upload ID,Raw-Seq Num,Raw-Device Type
1,12/15/10,00:00:00,12/15/10 00:00:00,,,,,,,,,,,,,,,,,,,,,,,,,,,,,28.4,ResultDailyTotal,"AMOUNT=28.4, CONCENTRATION=null",5472682886,50184670,236,Paradigm 522
2,12/15/10,00:04:00,12/15/10 00:04:00,,,,,,,,,,,,,,,,,,,,,,,,,,,120,16.54,,GlucoseSensorData,"AMOUNT=120, ISIG=16.54, VCNTR=null, BACKFILL_INDICATOR=null",5472689886,50184670,4240,Paradigm 522
3,12/15/10,00:09:00,12/15/10 00:09:00,,,,,,,,,,,,,,,,,,,,,,,,,,,116,16.21,,GlucoseSensorData,"AMOUNT=116, ISIG=16.21, VCNTR=null, BACKFILL_INDICATOR=null",5472689885,50184670,4239,Paradigm 522

I made a new directory below my home directory for this file and for the python scripts to live, and I started up python from the command line inside that directory.  Then I opened emacs (could have been TextEdit or any other editor you like) to write simple script to see my data.

A really easy way of importing this kind of file into python is to use a DictReader.  DictReader is looking for a file formatted exactly as this file is, and it’s easy to use.  I wrote this simple script to take a look at the values in the “Sensor Glucose” field (note there are sometimes gaps and I had to decide what to do in that case):

#!/usr/bin/env python
import csv
from matplotlib.pylab import *
dataReader = csv.DictReader(open(‘Jason_large_dataset.csv’, ‘rU’), delimiter=’,’, quotechar=’|’)
i=0
datalist = []
for row in dataReader:
    print i, row["Sensor Glucose (mg/dL)"]
    if row["Sensor Glucose (mg/dL)"] == “”:
        datalist.append(-1)
    else:
        datalist.append(float(row["Sensor Glucose (mg/dL)"]))
    i+=1
    continue
print min(datalist), max(datalist)
scatter(arange(len(datalist)), datalist)
show()

And this is the picture that popped out:

Taking a quick look at the Glucose levels

I don’t know how easy it is to see this but there are lots of gaps (when there’s a gap I plotted a dot at -1, and the line at -1 looks pretty thick).  Moreover, it’s clear this data is being kept in a pretty tight range (probably good news for my friend).  Another thing you might notice is that the data looks more likely to be in the lower half of the range than in the upper half.  To get at this we will draw a histogram of the data, but this time we will *not* fill in gaps with a bunch of fake “-1″s since that would throw off the histogram.  Here are the lines I added in the code:

#!/usr/bin/env python
import csv
from matplotlib.pylab import *
dataReader = csv.DictReader(open(‘Jason_large_dataset.csv’, ‘rU’), delimiter=’,’, quotechar=’|’)
i=0
datalist = []
skip_gaps_datalist = []
for row in dataReader:
    print i, row["Sensor Glucose (mg/dL)"]
    if row["Sensor Glucose (mg/dL)"] == “”:
        datalist.append(-1)
    else:
        datalist.append(float(row["Sensor Glucose (mg/dL)"]))
        skip_gaps_datalist.append(float(row["Sensor Glucose (mg/dL)"]))
    i+=1
    continue
print min(datalist), max(datalist)
figure()
scatter(arange(len(datalist)), datalist)
figure()
hist(skip_gaps_datalist, bins = 100)
show()

And this is the histogram that resulted:

This is a pretty skewed, pretty long right-tailed distribution.  Since we know the data is always positive (it’s measuring the presence of something in the blood stream), and since the distribution is skewed, this makes me consider using the log values instead of the actual values.  This is because, as a rule of thumb, it’s better to use variables that are more or less normally distributed.  To picture this I replace one line in my code:

skip_gaps_datalist.append(log(float(row["Sensor Glucose (mg/dL)"])))

And this is the new histogram:

This is definitely more normal.

Next time we will talk more about cleaning this data and what other data we will use for the model.

  1. Aaron
    June 22, 2011 at 9:17 am

    This is cool. But one thing I don’t understand. Your first histogram looks pretty Poisson-like. Why mess with that?

  2. June 28, 2011 at 1:34 pm

    It’s all about normal distributions baby. Hopefully this will become clear when I start going more in-depth with the modeling.

  3. June 16, 2012 at 3:03 am

    Howdy! I’m working on a library that fetches data from devices for diabetics automatically. It currently works for several Lifescan meters, and for bits and pieces of paradigm series pumps. Would love to have your methods here incorporated into insulaudit.

    One of the features we plan to build is a slippy map, like at glucosurfer. I can’t decide if getting matplotlib to generate tiles for a slippy map, or using something like d3.js would be more effective. Either way, turning your functions into a module for insulaudit would be a huge help to people.

    https://github.com/bewest/insulaudit

    -bewest

    • June 16, 2012 at 5:58 am

      Interesting!

      I’ve been neglecting this project but hope to get back to it in August. Can we touch base then?

      Cathy

  1. August 27, 2011 at 10:19 am
Comments are closed.
Follow

Get every new post delivered to your Inbox.

Join 1,722 other followers

%d bloggers like this: