iproute2/tc
Eric Dumazet 6987ecf083 sfq: add optional RED on top of SFQ
Adds an optional Random Early Detection on each SFQ flow queue.

Traditional SFQ limits count of packets, while RED permits to also
control number of bytes per flow, and adds ECN capability as well.

1) We dont handle the idle time management in this RED implementation,
since each 'new flow' begins with a null qavg. We really want to address
backlogged flows.

2) if headdrop is selected, we try to ecn mark first packet instead of
currently enqueued packet. This gives faster feedback for tcp flows
compared to traditional RED [ marking the last packet in queue ]

Example of use :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
	limit 3000 headdrop flows 512 divisor 16384 \
	redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 ewma 6 min 8000b max 60000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 4876 prob_drop 6131
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
requeues 0)
 rate 99483Kbit 8219pps backlog 689392b 456p requeues 0

In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
flows, we can see number of packets CE marked is smaller than number of
drops (for non ECN flows)

If same test is run, without RED, we can check backlog is much bigger.

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
 rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:22 -08:00
..
.gitignore Add ignore files to make using git easier 2006-08-08 12:04:38 -07:00
Makefile iproute2: proper detection of libxtables position and flags 2012-01-03 15:05:25 -08:00
README.last (Logical change 1.3) 2004-04-15 20:56:59 +00:00
em_cmp.c Fix wrong comparison in cmp_print_eopt() 2011-10-07 11:16:15 -07:00
em_meta.c tc: remove dlfcn.h from files that dont need it 2009-11-13 14:14:07 -08:00
em_nbyte.c tc: remove dlfcn.h from files that dont need it 2009-11-13 14:14:07 -08:00
em_u32.c tc: remove dlfcn.h from files that dont need it 2009-11-13 14:14:07 -08:00
emp_ematch.l fix build issues with flex ver 2.5 2010-04-22 15:27:42 -07:00
emp_ematch.y ematch: fix warning about yyerror and const 2012-01-03 13:55:00 -08:00
f_basic.c Update various classifiers' help output for expected CLASSID syntax 2008-02-13 12:36:38 -08:00
f_cgroup.c iproute2: Remove unreachable code 2011-07-11 10:13:51 -07:00
f_flow.c iproute2: tc: f_flow: add key rxhash 2010-11-30 09:57:36 -08:00
f_fw.c tc: remove stale code 2010-01-21 10:13:01 -08:00
f_route.c tc: remove stale code 2010-01-21 10:13:01 -08:00
f_rsvp.c tc: remove stale code 2010-01-21 10:13:01 -08:00
f_tcindex.c tc: remove stale code 2010-01-21 10:13:01 -08:00
f_u32.c tc filter: fix dport/sport in pretty print output 2011-05-19 09:19:17 -07:00
m_action.c libnetlink: remove unused junk callback 2011-12-28 10:37:12 -08:00
m_csum.c tc: add ACT_CSUM action support (csum) 2010-12-01 11:17:46 -08:00
m_ematch.c Fix NULL pointer reference when using basic match 2010-07-29 18:03:35 -07:00
m_ematch.h ematch related bugfix and cleanup 2008-05-29 11:54:19 -07:00
m_estimator.c Replace "usec" by "time" in function names 2007-03-13 14:42:17 -07:00
m_gact.c Remove trailing whitespace 2006-12-05 10:10:22 -08:00
m_ipt.c tc: Remove unused variable 'res'. 2011-11-23 14:46:21 -08:00
m_mirred.c Remove mirred debug message 2010-03-29 17:32:37 -07:00
m_nat.c tc: remove dlfcn.h from files that dont need it 2009-11-13 14:14:07 -08:00
m_pedit.c Remove trailing whitespace 2006-12-05 10:10:22 -08:00
m_pedit.h Remove trailing whitespace 2006-12-05 10:10:22 -08:00
m_police.c ATM cell alignment. 2008-04-17 10:04:31 -07:00
m_skbedit.c skbedit: fix set-never-used warning 2011-06-29 15:59:02 -07:00
m_xt.c iproute2: fix calling up the xt action 2012-01-03 15:07:38 -08:00
m_xt_old.c tc: Remove unused variable 'res'. 2011-11-23 14:46:21 -08:00
p_icmp.c Remove trailing whitespace 2006-12-05 10:10:22 -08:00
p_ip.c Remove trailing whitespace 2006-12-05 10:10:22 -08:00
p_tcp.c Remove trailing whitespace 2006-12-05 10:10:22 -08:00
p_udp.c Remove trailing whitespace 2006-12-05 10:10:22 -08:00
q_atm.c tc: remove stale code 2010-01-21 10:13:01 -08:00
q_cbq.c tc: remove stale code 2010-01-21 10:13:01 -08:00
q_choke.c red: make burst optional 2011-12-01 09:23:49 -08:00
q_drr.c iproute2: Remove unreachable code 2011-07-11 10:13:51 -07:00
q_dsmark.c tc: remove stale code 2010-01-21 10:13:01 -08:00
q_fifo.c tc: add new queue discipline: head drop fifo 2010-03-03 16:15:44 -08:00
q_gred.c red: make burst optional 2011-12-01 09:23:49 -08:00
q_hfsc.c HFSC (7) & (8) documentation + assorted changes 2011-11-02 16:33:50 -07:00
q_htb.c tc: remove stale code 2010-01-21 10:13:01 -08:00
q_ingress.c tc: remove stale code 2010-01-21 10:13:01 -08:00
q_mqprio.c iproute2: improve mqprio inputs for queue offsets and counts 2011-04-26 14:59:32 -07:00
q_multiq.c iproute2: Remove unreachable code 2011-07-11 10:13:51 -07:00
q_netem.c tc: netem rate shaping and cell extension 2012-01-19 14:28:27 -08:00
q_prio.c tc: remove stale code 2010-01-21 10:13:01 -08:00
q_qfq.c Add QFQ scheduler 2011-07-13 13:46:34 -07:00
q_red.c red: fix adaptive spelling 2012-01-20 08:12:21 -08:00
q_rr.c tc: remove stale code 2010-01-21 10:13:01 -08:00
q_sfb.c tc : SFB flow scheduler 2011-04-12 14:27:37 -07:00
q_sfq.c sfq: add optional RED on top of SFQ 2012-01-20 08:12:22 -08:00
q_tbf.c tc: remove stale code 2010-01-21 10:13:01 -08:00
static-syms.c support static-only systems 2009-11-10 10:44:20 -08:00
tc.c Changing commandline help text to be more uniform... 2009-03-27 11:05:44 -07:00
tc_cbq.c Replace "usec" by "time" in function names 2007-03-13 14:42:17 -07:00
tc_cbq.h (Logical change 1.3) 2004-04-15 20:56:59 +00:00
tc_class.c libnetlink: remove unused junk callback 2011-12-28 10:37:12 -08:00
tc_common.h add generic size table for qdiscs 2008-09-17 21:57:15 -07:00
tc_core.c add generic size table for qdiscs 2008-09-17 21:57:15 -07:00
tc_core.h add generic size table for qdiscs 2008-09-17 21:57:15 -07:00
tc_estimator.c Introduce TIME_UNITS_PER_SEC to represent internal clock resolution 2007-03-13 14:42:16 -07:00
tc_filter.c libnetlink: remove unused junk callback 2011-12-28 10:37:12 -08:00
tc_monitor.c update rest to use nl_mgrp 2007-03-13 14:39:05 -07:00
tc_qdisc.c libnetlink: remove unused junk callback 2011-12-28 10:37:12 -08:00
tc_red.c red: give a hint about burst value 2011-12-01 09:23:43 -08:00
tc_red.h (Logical change 1.3) 2004-04-15 20:56:59 +00:00
tc_stab.c add generic size table for qdiscs 2008-09-17 21:57:15 -07:00
tc_util.c netem: add support for 4 state and GE loss model 2011-12-22 17:08:11 -08:00
tc_util.h netem: add support for 4 state and GE loss model 2011-12-22 17:08:11 -08:00

README.last

Kernel code and interface.
--------------------------

* Compile time switches

There is only one, but very important, compile time switch.
It is not settable by "make config", but should be selected
manually and after a bit of thinking in <include/net/pkt_sched.h>

PSCHED_CLOCK_SOURCE can take three values:

	PSCHED_GETTIMEOFDAY
	PSCHED_JIFFIES
	PSCHED_CPU


 PSCHED_GETTIMEOFDAY

Default setting is the most conservative PSCHED_GETTIMEOFDAY.
It is very slow both because of weird slowness of do_gettimeofday()
and because it forces code to use unnatural "timeval" format,
where microseconds and seconds fields are separate.
Besides that, it will misbehave, when delays exceed 2 seconds
(f.e. very slow links or classes bounded to small slice of bandwidth)
To resume: as only you will get it working, select correct clock
source and forget about PSCHED_GETTIMEOFDAY forever.


 PSCHED_JIFFIES

Clock is derived from jiffies. On architectures with HZ=100
granularity of this clock is not enough to make reasonable
bindings to real time. However, taking into account Linux
architecture problems, which force us to use artificial
integrated clock in any case, this switch is not so bad
for schduling even on high speed networks, though policing
is not reliable.


 PSCHED_CPU

It is available only for alpha and pentiums with correct
CPU timestamp. It is the fastest way, use it when it is available,
but remember: not all pentiums have this facility, and
a lot of them have clock, broken by APM etc. etc.