Methods of analysis that incorporate self-similarity can be applied to data traffic within networks.
Example
Suppose that you are monitoring a 1-Mbps frame relay line and fixed-length frames of 4000 bits are being transmitted, so that the transmission time of each frame is 4 ms. The following arrival times are recorded at the receiver (the time at which the first bit of each frame arrives):
0 8 24
32 72 80 96
104 216 224 240
248 288 296 312
320
648 656 672
680 720 728 744
752 864 872 888
896 936 944 960
968
That is, the first frame arrives at t = 0 ms, the second at t = 8 ms, and so on.
It is difficult at first to discern any pattern or statistical properties. However, the traffic does seem bursty, as you would expect for data traffic. Some of the arrival times are clustered together, and there are some gaps. The largest gap is 328 ms from time 320 to 648, but there are some smaller gaps as well, including numerous gaps of 40 ms or more, the equivalent of 10 frame times or more. Suppose that we aggregate the traffic and consider a cluster to be any group of frames in which there are no gaps greater than five frame times (20 ms), and we record the start time of each cluster. Then we have
0 72 216 288 648 720 864 936
The gaps between clusters are of uneven length, but it is still difficult to observe a pattern. Let's try a greater degree of aggregation. Define a cluster as any group of frames in which there are no gaps greater than 10 frame times (40 ms). Then we have arrival times as follows:
0 216 648 864
In this case the gaps between arrivals are 216, 432, 216 ms. The pattern is two clusters with a gap between, followed by a larger gap, followed again by two clusters with the smaller gap between.
Now go back and look at the previous aggregation into eight clusters, and we see this pattern repeated. The first four arrival times follow the pattern of (arrival, short gap, arrival, long gap, arrival, short gap, arrival), as do the last four arrival times. Looking back at the original data set of 32 arrivals, we see this same pattern repeated eight times. Thus, we have a pattern that appears in the raw data and again at different levels of aggregation. That is, the time sequence exhibits the same pattern regardless of the degree of resolution. This is the essence of self-similarity.
The Self-similarity Concept
Self-similarity is such an important concept that, in a way, it is surprising that only recently has it been applied to data communications traffic analysis. The ubiquity of self-similarity was emphasised in a memorable statement by Manfred Schroeder [SCHR91]:
The unifying concept underlying fractals, chaos, and power laws is self-similarity. Self-similarity, or, invariance against changes in scale or size, is an attribute of many laws of nature and innumerable phenomena in the world around us. Self-similarity is, in fact, one of the decisive symmetries that shape our universe and our efforts to comprehend it.
A phenomenon that is self-similar looks the same or behaves the same when viewed at different degrees of "magnification" or different scales on a dimension. The dimension can be space (length, width) or time. Here, we are concerned with time series and stochastic processes (i.e. those having a probability distribution with finite variance) that exhibit self-similarity with respect to time. The next section makes the scaling operation with respect to time precise.
Frame Arrivals
The pattern discussed here is easier to see in a picture.
Figure (t)5.1a depicts the sequence of frame arrivals over time. Each vertical
line represents one frame, with a width proportional to 4 ms, the time
it takes a receiver to absorb an entire frame, from first bit to last.
Figure (t)5.1b shows the data aggregated into four large clusters; the
height and width of the vertical lines in the aggregated sequence are in
proportion to the scale of the aggregation. In this figure, it is easy
to see that the pattern of (arrival, short gap, arrival, long gap, arrival,
short gap, arrival) appears at different resolutions of the data.
This made-up example is derived from the Cantor set,
a famous construct that appears in virtually every book on chaos, fractals,
and non-linear dynamics.
Figure (t)5.2 illustrates the construction of the Cantor
set, which obeys the following rules:
l. Begin with the closed interval [0, 1], represented
by a line segment.
2. Remove the middle third of the line.
3. For each succeeding step, remove the middle third
of the lines created by the preceding step.
Figure (t)5.2 A recursive Cantor set
This is essentially a recursive process that can be more
precisely defined as follows. Let S; represent the Cantor set after i
levels of recursion. Then
So = [0, 1]
S1 = [0, 1/3] U [2/3, 1]
S2 = [0, 1/9] U [2/9, 1/3] U [2/3, 7/9] U [8/9, 1]
and so on.
Cantor Sets
If we think of the Cantor line as being a time line, then each successive step magnifies the time scale by a factor of 3. Note that at every step, the left (and right) portion of the set is an exact replica of the full set in the preceding step. The Cantor set reveals two properties seen in all self-similar phenomena:
1. It has structure at arbitrarily small scales. If we magnify part of the set repeatedly, we continue to see a complex pattern of points separated by gaps of various sizes. Like a complicated spy thriller, with "wheels within wheels," the process seems unending. In contrast, when we look at a smooth, continuous curve under repeated magnification, it becomes more and more featureless.
2. The structures repeat. A self-similar structure contains smaller replicas of itself at all scales. For example, at every step, the left (and right) portion of the Cantor set is an exact replica of the full set in the preceding step.
These properties do not hold indefinitely for real phenomena. At some point under magnification, the structure and the similarity break down. But over a large range of scales, many phenomena exhibit self-similarity.
Although ours is a simple made-up example, we can gain some insights into self-similar data traffic from its study. Perhaps the most striking feature, from the point of view of network performance, is the persistence of clustering. With Poisson traffic, clustering occurs in the short-term (small time scale) but smoothes out over the long term1. We can design a system of servers and queues with buffers in the expectation of such long-term smoothness.
The implication is that, because things smooth out over the long run, only modest-sized buffers are needed; a queue may build up in the short run, but over a longer period the buffers are cleared out. However, if the bursty behaviour is itself bursty-that is, the clusters are clustered, then queue sizes may build up more than would be expected from a Poisson traffic stream. This leads to the observation that traditional queuing analysis, which assumes Poisson traffic, may not accurately predict the performance of self-similar traffic. We will see that there is evidence to support this observation.
Below, there are 3 models of Ethernet traffic, which can be seen to be bursty. The left column is drawn from real traffic data and can be seen to be bursty at all levels of magnification. The centre column shows how the Poisson analysis of the data traffic gradually smoothes out with lesser magnification, smoothing out to a constant data flow which is nothing like real conditions. The right column depicts traffic made from a synthetic self-similar model.
To be more specific, clustering occurs on a preferred time scale determined by the mean inter-arrival time of the Poisson events. The notion of short term is relative.
Self-similarity and real world and Poisson
The diagram below shows 3 examples of traffic analysis bases on real data. It can be seen that the Poisson model fails at lower scales of resolution.
Actual Measurement; Synthetic, Poisson Model; Synthetic Self-Similar Model
Inferences
The mathematical modelling that is based on the Poisson distribution does not scale well to bursty data traffic and is therefore unsafe to use in practice. A more realistic model is that of self similarity that allows for the real-life burstiness of data traffic as seen in data networks.
Further reading:
High Speed Networks, TCP/ IP and ATM Design Principles,
W Stallings, Prentice Hall