After Ben's initial idea presentation, we got to work with looking at the data. We received our data from the Tims in the IT department who gave us three different types of log files for each day over a week's time. The different log files show the average and maximum upload and download usage for our primary internet provider, secondary provider, and our internal network. To load this into R and RStudio, we created a script called createDF and another called updateDF, in order to merge the new incoming data each day. Both these scripts, as well as the rest of our code and data frames can be found in the Code section.
Using R, we worked to sort out the data, create visualizations, and analyse our results. In our final analysis we focused most heavily on answering three specific questions:
To hear more about our conclusions and analysis, see our Final Presentation .
The initial proposal to research Cornell's Network Traffic was presented by Ben Oakley for the CSC/STA 255 "Dealing with Data" class. Upon hearing the idea, Jeffrey Klow, Paul German, and Emily Andrulis decided to come on board and dig into the data. With help from the Tims in the IT department, they were able to use log files tracking the network usage on Cornell's campus with high resolution for a week, and lower resolution that goes back to November.
Our goals in this project were to examine our network usage as a whole and over certain time frames. We wanted to delve into the data to figure out when we were most active with uploads and downloads and at which points we were drastically less busy. From this information we wanted to create a picture of what the average day looks like at Cornell for network traffic. Another useful aspect of our project is that we were able to identify how often and when over the past few months we have come close to reaching our maximum bandwidth capacity. With this information, we were able to extrapolate when we might want to consider buying more bandwidth as we continuously get closer on average to reaching our bandwidth cap.
The Network Traffic Project was one of the six main projects presented for the "Dealing with Data" class. Cross listed as both a computer science and statistics course, this was the first time that "Dealing with Data" had been offered at Cornell College. Co-taught by Professors Ann Cannon and Ross Sowell this class had a prerequisite of either CSC 140 or STA 201, the beginning level classes in their respective fields. With a variety of computer science and statistics backgrounds, the professors matched up students so that each group had a mixture of different strengths and weaknesses. The class was comprised of three big projects, where the first two week-long projects were completed in pairs, and the main block long assignment was done in groups of three or four. The Network Traffic Project is our main project, and we have done presentation check-ins for it for the past two weeks, which leads us into the culmination of our Final presentation this Tuesday.