Last months “Bad Science” blog posting, called “Bad Science: Incessant Protocol Comparisons“, triggered some off-line discussions, with various students, colleagues and peers, on bad science in general: on how many papers are lacking rigor in statistical treatment of results, for example, and on how many papers manage to get published despite an (obvious, although I hope, involuntary) bias.
These discussions made me think of a conversation I had with David B. Johnson of Rice University, many many years back, in which Dave was making the throw-away comment that he was;
…tired of seeing papers and presentations stating “protocol X scales to 50000”, but omitting that this scapability claim was only true under very very narrow conditions.
I have honestly forgotten what “protocol X” we were discussing at the time, or what the unit of “50000” was, but that is not important – Dave’s comment is a particular instance of a more general observation, which I have been known to make on more than one occasion:
Show me a protocol, and I will come up with a scenario in which mine is better …
This is, of course, particularly common in papers proposing a “new protocol for solving a know problem, in a domain with existing protocols” or an “extension to an existing protocol” – where, of course, it is in the interest of the paper to exhibit situations where the “new protocol” or “protocol extension” excels.
Let me give a an example of this:
The paper “Ad hoc On-Demand Distance-Vector Routing Scalability”, and the extended version “Scalability study of the ad hoc on-demand distance vector routing protocol“, both by Sung-Ju Lee, Elizabeth Belding-Royer, and Charles E. Perkins, present scalability, and scalability extensions, to the AODV routing protocol. Simulation studies are in these papers shown for networks of up to 10000 routers in a single flat routing domain, and even “basic” AODV performs “reasonably” in the study presented.
AODV is a reactive routing protocol. It works by a a router with a packet to deliver to a destination will flood a route request through the network. On receiving the route request, the destination responds by a route reply, unicast along the reverse path (which was taken by the route request) towards the source. Thus, the message complexity for discovering the route between one pair of routers becomes O(n), where n is the number of routers in the network….if one router seeks to find routes to all other routers in the network, the message complexity becomes O(n^2) …
… and if all routers nend to find routes to all destinations in the network, as is what typical routing protocols do, AODV yields a message complexity of O(n^3). “Typical routing protocols” such as OSPF and OLSR, accomplish this with a message complexity of O(n^2) – or (for OLSR) possibly better.
This does not imply, of course, that AODV is a bad protocol. For network deployments with few concurrent traffic flows, it may even be an excellent choice. For other network deployments, where a majority of devices need to communicate with a majority of devices, it may be a rather poor choice.
But what it does imply is, that it is extremely easy to “bias” the outcome of a simulation study:
- if I want to show that AODV scales to 10000 (or, as a hat-tip to Dave, 50000) devices in a flat routing domain, then I will pick scenarios with very few concurrent (or, maybe just one) traffic flows;
- if, on the other hand, I want to show that AODV barely scales toa couple of dozens of devices ∫ in a flat routing domain, then I will pick scenarios where every device needs a route to every other device.
Here is an example of this, from the paper “Performance Study of AODV with Variation in Simulation Time and Network Size” by D. Manjunatha*, M. T. Somashekara, K.S. Archana, and S. Ravishankar, published in 2014 in the “International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)“.
This paper shows network simulations of AODV up to 300 “nodes”, and looking at the two graphs above, it really seems that AODV is a stellar performer, what, with rather consistent throughout regardless of the number of “nodes” in the network. Great results for AODV, right? But there’re a couple of pieces of information missing. The “simulation parameters” of the study included say nothing about how many active concurrent routes are maintained (how many concurrent communicating pairs exist), nor how frequent these pairs change.
It is perfectly plausible that with a single communicating pair of “nodes”, furthermore if that pair doesn’t change, AODV will scale to 300 “nodes”, and that with great performance.
The paper commits the ultimate simulation study sin, and presents this conclusion:
I.e., it draws a generalised conclusion based on having studied an extremely narrow scenario.
It its befuddling how something like this manages to get past peer reviews and editors, which casts considerable doubts on the serious of the “International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)“.
For the routing protocol OLSR, it is equally easy to “bias” the outcome of a simulation study. In OLSR, each router will select a (minimal) subset of its 1-hop neighbors as relays (in OLSR termed MPRs) such that a message generated by it, and relayed by all its selected MPRs, will reach all routers 2 hops away. In very dense networks, this means that the message complexity of a single flooding operation becomes much lower than O(n) — whereas in sparse networks, of course, the message complexity of a flooding operation degenerates to exactly O(n).
- if I want to show that OLSR scales better than AODV in a flat routing domain, then I will pick scenarios with many concurrent traffic flows, a very high network density, and a very modest network diameter. My simulation studies will, in that case, end up showing trivially the difference between O(n^2) vs O(n^3).
- if, on the other hand, I want to show that AODV scales better than OLSR in a flat routing domain, all I need to do is to pick scenarios with few concurrent traffic flows, and a low density, where the cost of a flooding operation in OLSR becomes O(n) and the message complexity of OLSR becomes O(n^2). My simulation studies will, in that case, end up showing trivially the difference between O(n^2) vs O(n).
In none of these cases, are the results particularly interesting – but, it makes it easy to “pick and chose the scenarios that I study” to prove the point I am trying to make.This is one of the great fallacies of simulation studies, which is committed so often that it is almost systematic:
- Scenarios are chosen not so as to “falsify a point”, but so as to “confirm a point“, and
- Based results from studying a narrow scenario, a general conclusion is claimed.
Simulation studies are not doomed – but it is always important when reading a paper presenting simulation studies to understand in which context the results can be interpreted: if the scenarios are constructed to “prove a point” or if they are trying to uncover boundaries between where one or another routing protocol is more applicable. A good research paper, documenting a study using network simulations, is aware of this and is openly and honestly calling attention to if an (intentional or otherwise) bias is present in the study, and thus carefully frames the context wherein its conclusions are valid.There is nothing inherently wrong with a paper claiming to be “studying if A or B is better in scenario X” – but, there is a lot wrong with a paper claiming that “A is always better than B, because that is the case in simulations of scenario X“.