Skip to main content
[an error occurred while processing this directive]

[an error occurred while processing this directive]
HomeProjects >

Duke DataFest

 
Duke Datafest 2014 Travel Report
James Brofos, Ajay Kannan, Rui Shu

 

Overview

Participating in Duke DataFest 2014 was a great experience and contributed significantly to our understanding of how to make sense of challenging data. The objective of the competition was to analyze a dataset provided by a company and to present our findings. We performed well despite being the smallest team, receiving an Honorable Mention for our project. We wish to thank the Neukom Institute for funding our travel to the competition.

Duke Data Fest

Competition Details

GridPoint, a company that collects and analyzes energy usage data for other businesses, contributed a dataset consisting of various aspects of commercial buildings' energy usage. The dataset was unlike others we had worked with before. Consisting predominantly of time series data, with significant portions of missing data and enormous number of data points, this dataset introduced us to some of the challenges of modern data analysis.

After exploring several project ideas, we ultimately chose to examine the strength of the link between weather-related variables and power outages. After validating that there seemed to be strong correlation between weather data and power outages via t-Distributed Stochastic Neighbor Embedding, we built a two-class Support Vector Machine to predict when electrical failures occur, which would allow business owners to better prepare for power outages. Our model performed well, and we capped off the weekend by presenting our work to a panel of industry and academic judges.

Take-Aways

One of the most valuable facets of competing at Duke was putting classroom statistical and machine learning training into practice. Datasets are not always well-behaved, and more thought must go into the complex causes of phenomena observed in the data than in problem sets or class projects. In addition, we learned the importance of combining datasets from different sources to see greater context for originally puzzling observations. Learning about other teams' approaches was a valuable experience as well, giving us more perspective on different ways to approach obstacles. Also, being able to speak to industry experts on how to interpret sensor data was invaluable and gave us a greater understanding of how to interpret energy data.

We believe that a similar competition held at Dartmouth would garner high interest and be extremely valuable to students. Having experienced Duke's DataFest first hand, we have insight into how to organize such a contest, and we hope that we can bring DataFest to Dartmouth next year.

 

Last Updated: 4/2/14