This paper won the best paper award at UCSB’s 2009 Graduate Student Workshop on Computing.
Abstract: We observe that when network traffic behaviors are represented in vector spaces as relative frequency histograms of behavioral features, they exhibit low-rank linear structure. We hypothesize that this structure is due to the distribution of flow behaviors following a finite mixture model. Aside from being of theoretical interest, this hypothesis has practical consequences: it allows us to make predictions about the probabilities of future flow behaviors from a handful of a flow’s initial packets. From observing five initial packets, we are able to predict the distribution of future packet sizes and inter-packet intervals with between 70% and 90% accuracy across a variety of network traces. We can predict which flow will have more packets in pairwise comparisons with between 65% and 85% accuracy. These practical applications serve dual functions. They provide highly useful tools for network management, routing decisions, and quality of service schemes. However, they also provide evidence that the hypothesized model gives a correct explanation for the observed linear structure in real network traffic.