To better understand the data we cleaned and collected, we created a variety of visualizations to help better tell the story of what is happening with Cornell's network traffic. To split up the work amongst group members, we categorized our visualizations into static, animated, or interactive graphs. Below, you can view all our final graphs in these same categories.
Included in this set are the graphs showing all the network traffic over one week as shown through the 24 hour time period, the graph displaying all three types of log files over one day, the initial graph showing our projected average daily download usage, and the fixed graph showing projected average daily usage that takes into account the recent increase in bandwidth.
The first graph shows all network traffic over the main week we collected data. This is the highest resolution data we had, where we consistently were getting new data for every 5 minute interval throughout each day. The different days of the week are represented by different colors that go in rainbow order throughout the week. In the graph there are two obvious curves that most of the data follows rather closely: the top curve is the download usage and the bottom curve shows upload usage. Dotted lines represent maximums over the 5 minute interval, and solid lines represent averages, but often the two follow each other very closely.
The three types of log files that we collected were from the primary internet provider, secondary internet provider, and the internal network traffic that goes through the firewall and includes backups. This graph was meant to show how closely the internal and primary provider data follow each other, with the exception of the spikes around 9 AM and midnight due to the system backups. For this reason, primary provider data seems to most accurately represent Cornell's network traffic, of the three log files we were given.
To look at our future usage, we created a scatter plot that plotted the average download usage for each day in our data set. When we plotted lines to show the linear regression model, we noticed that the average daily download usage is definitely increasing as time goes on. However, we also took note of the fact that the spikes in usage around September would most likely be due to the fact that we got more bandwidth then, so the usage immediately jumped at least in maximum usage to meet this new cap. Taking this into account, we decided to also make a graph showing just the data since that new additional bandwidth was purchased. We still see a gradual increase in daily download usage on average, and with this projection we would have until January of 2017 at best before we are reaching 85% of our bandwidth as our average on a daily basis.
We made many different types of animated graphs to try and show what the average day's usage looks like at Cornell, and how that compares to other days in our data set. In particular, our graphs show all the Tuesdays compared to each other and the average day's Loess curve, all the Fridays compared to each other and the average day's Loess curve, all the days from last week compared to each other and the Loess curve, all the days since November compared to each other and the average day's Loess curve, and all the weeks of fourth block compared to each other. Remembering that we have three different types of log files, we made animated graphs for all the primary and secondary data, and a few from internal to give the viewer some sense of that data as well, even though it can be heavily biased by the firewall that it goes through and internal backups and such.
You can view and play with our interactive graphs directly from here by using the frames below, or if you want to go to the sites where they are hosted, you can go to the following links for the graphs on the network data over the past week and the interactive graph that allows you to upload a file.