[isf-wifidog] Traffic shaping, part 2: The issues

Ven 2 Nov 22:11:56 EDT 2007

There are different types of problems that can be solved through traffic 
shaping, and they must not be confused with one another, because the 
solutions for each is different.  However, solving them all at the same time 
in practice is difficult, because the solutions interact.

But first here are practical definitions for the two main performance 
characteristics of network performance: 

Bandwidth:  The amount of data the the network can transfer in a given unit of 
time.  This is what determines how long it takes to transfer a large file.

Latency:  The amount of time required for a small packet of data to make a 
round trip.  This is what determines how long it takes before something 
happens when you click something on the Internet (for example, when you click 
on your friend's head in Counterstrike...).  Note that except for large file 
transfer, high latency is the primary cause of perceived slowdown.

Off course, networks have other performances characteristics (jitter, packet 
loss, etc.) but we'll ignore them for now.

Issue 1:  Buffers in DSL/Cable/WiMax modems are way too large for good 
multi-user performance.
Problem caused:  Maxing out upload will lead to higher latency and lower 
download bandwidth, and vice-versa.  This is off course much more likely to 
happen if you have many users.
Historical reason for the problem's existence:  The ISPs designed their 
infrastructure to optimize performance for single users.  While large buffers 
can cause latency problems for even a single user (say you are downloading 
large files and playing a first person shooter game at the same time), the 
will lead to the largest peak bandwith.  In other words, the ISP will fare 
better when the connection is benchmarked on bandwidth.
Typical solution(s):  Limiting total incoming and outgoing bandwidth to 
slightly less (90-95%) of available bandwidth, prioritizing TCP ACK packets.

Solution in the wifidog context:  The same solution is applicable.  
Challenges:  
1- It is impractical to know in advance how much bandwidth is available.  Not 
only can every hotspot have different Internet plans, but even if one knows 
the maximum bandwidth of the ISP's plan, that bandwidth may not be available.  
For instance, if you subscribe to a 5Mbps DSL plan, if you have long/poor 
phone lines, you modem might connect at only 1.8Mbps.  So that bandwidth has 
to be measured by wifidog.
2- It is not always practical for the wifidog gateway to be the very first 
thing plugged in the modem.  If you are plugged into a LAN that is server by 
a DSL modem, and someone uses bandwidth somewhere on the LAN, your 
measurement above will no lounger be valid, and your shaping may actually 
make things worse if you are not very carefull.  

Issue 2:  Users not getting their fair share of bandwidth.
Problem caused:  When someone download's large files using modern P2P 
applications, in addition to triggering Issue 1, he will cause additional 
problems.  That is if 3 users are on the network, two downloading a mail 
attachment, and one downloading a file over P2P.  Typically, they will NOT 
get 1/3 of the available bandwidth each.

Historical reason for the problem's existence:  In the begining, shapers 
did "fair queuing" between IP/port pairs.  So in theory If the P2P client in 
the scenario above opened 100 connections, the use would get ~98% of the 
bandwidth.  Modern shapers and default kernel configs aren't nearly that bad, 
but e frequent problem is that the oldest opened connection will keep hodding 
most of the bandwidth.
Typical solution(s): Various misguided "solutions" are frequently applied to 
this problem:
	-Static user classes:  Say you have a 3Mbps uplink, and you only allow users 
to use up to 300Kbps.  This will fix the problem (assuming you have no more 
than 10 concurrent users), at the cost of making the connection suck for 
everyone, 100% of the time. 
	-Connection aging:  Make each connection fast at first, and then slow it 
down.  Yes, people really do this.  The rationale is (presumably) that web 
browsing will be fast (lots of small, short connections) and the user will 
only look at the download speed of large file at the beginning.  Besides 
being stupid, this will actually give a BIG advantage to our P2P user 
compared to our two mail users:  the P2P client will snob peers that look 
like they are slowing down, and will open brand new, fast connections to new 
peers.
	-Trying to block/throttle P2P users.  Above and beyond the fact that this is 
is ethically questionable for various reasons, it is an arm's race that 
network administrators are unlikely to win.   See examples in my last email.  
It's also extremely shortsighted since it's very expensive both 
computationally and in manpower, and has to be revisited over and over.

Solution in the wifidog context:  ESFQ (Enhanced Stocastic Fair Queuing), 
which would allow each wireless client to get no more than it's share of 
bandwidth, but allow the entire amount of bandwidth to be used.
Challenges:  
1- Issue 1 must be solved for ESFQ to have any chance of working at all.
2- It is not possible to instantly throttle downstream bandwidth.  The lag 
time in doing so can cause problems fo ESFQ. 

Issue 3:  Applications that would need priority, such as VOIP
Problem caused:  Depending on the user, some application should have more 
priority over other for network latency.  VOIP > web browsing.  SSH > FTP.  
World of warcraft > Bittorrent > Everything else.

Historical reason for the problem's existence:  Ever since IPv4 was 
standardised, there was a QoS flag, that you were supposed to set when an 
application needs priority.  Sadly, human nature being what it is, if the 
users notice that an application will go faster if they set the flag, they 
would start to set it for every application (not caring that it may slow down 
their neighbor).  Once the neighbor notices, he will very rationally set the 
flag as well to defend himself, leaving the whole network ... right back 
where it started.  So in practice no one obeys the QoS flag anyway.

Typical solution(s):  Trying to discriminate the type of service from the port 
range (or more sophisticated packed analysis), make a value judgement over 
which service is more important that some other, and give priority acoording 
to that grid.  The problems are one again the questionnable ethics of it, and 
the simple fact that not only what is a priority for one user may not be for 
another, but that if ISP would start to give priority to everything VOIP, you 
can be sure that P2P apps would offer an aption to transfer data over VOIP 
protocols.
Solution in the wifidog context: Actually obey the QOS flag, but only up to a 
part (say 10%) of the slice the user would get in the solution to Issue 2.  
In other words, pass ACKs first, QOS traffic second (up to 10% of the user's 
slice), and pas the rest after.
Challenges:  
1- Issue 1 and 2 must be solved.
2- If you VOIP handset doesn't set the QOS flag, it doesn't help you (although 
you'll probably still get decent performance from the solution to issue 2)

Issue 4:  Chronic bandwith abuse over a long period/reducing bandwidth cost.
Bandwith takes real money, and real resources to create.  Whether you run a 
free or for pay network, you may decide that there is a maximum amount of 
network resources that your users should be allowed to use per 
day/month/hotspot.

Typical solution(s):  
-Bandwidth capping:  Not allowing the user to use more than 1Mbps
-Data transfer capping:  Not allowing the user to tranfer more than 40GB per 
month.

Solutions in the wifidog context:  
-Dynamic abuse control.  Allow defining criterias of maximum data transfer per 
unit of time, at a hotspot, over the entire network, etc. 
-Opening hours support.  For free networks designed to be used in public 
places, closing access when the public place is closed can drasticaly reduce 
monthly bandwidth consumption.
-Supporting the "password of the day" model.  Allows drastically reducing the 
bandwidth leached by a hotspot's neighbor's by forcing them to physically 
visit the place to get access.

Note that technically, none of the above require any kind of traffic shaping.  
Traffic shaping is involved if you want to implement policies that are a 
little less drastic than cutting off access once a user/machine reaches the 
threshold.  Let's say that your quota in 5GB per hotspot per month, instead 
of cutting off the user once he reaches 5GB, at 4GB you would progressively 
slow down the user in such a way that he would never reach 5GB, or you would 
slow down the user to a low maximum bandwidth (say 128Kbps), or make him pass 
after every other users.

Challenges:
1-The wifidog protocol needs to be redesigned to allow the auth server to  
specify the maximum bandwidth for each user individually, and update that 
number periodically.
2-The user could open a new account/spoof the MAC address.  There are ways to 
make that very inconvenient, but that's another arms race (and another 
feature list altogether).

Issue 5:  Selling the user monthly access with fixed bandwidth (say 512Kbps).

Typical solution(s):  Client side user classes

Solutions in the wifidog context:  server side token architecture and per user 
bandwidth specification.  Basically, if we have per user bandwidth 
specifications in the gateway and protocol, selling fixed slices is just a 
degenerate case of the general problem.

Ok, that's long enough for one night.