Commit Graph

501 Commits

Author SHA1 Message Date
David Ward 6c99695da2 tc: red, gred: Fix format specifier in burst size warning
burst is an unsigned value.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward 9d9a67c756 tc: red, gred: Rename overloaded variable wlog
It is used when parsing three different parameters, only one of
which is Wlog. Change the name to make the code less confusing.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
Daniel Borkmann ec6f5abcea tc: minor cleanup on ingress
Fix whitespacing and remove the unnecessary condition.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-05-11 09:18:10 -07:00
WANG Cong 285e7768e8 tc: fill in handle before checking argc
When deleting a specific basic filter with handle,
tc command always ignores the 'handle' option, so
tcm_handle is always 0 and kernel deletes all filters
in the selected group. This is wrong, we should respect
'handle' in cmdline.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2015-05-11 09:13:20 -07:00
Daniel Borkmann d937a74b6d tc: {m, f}_ebpf: add option for dumping verifier log
Currently, only on error we get a log dump, but I found it useful when
working with eBPF to have an option to also dump the log on success.
Also spotted a typo in a header comment, which is fixed here as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-05-04 08:43:08 -07:00
Daniel Borkmann 4bd624467b tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e4 ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.

File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:

  # env | grep BPF
  BPF_NUM_MAPS=3
  BPF_MAP1=6        <- BPF_MAP_ID_QUEUE (id 1)
  BPF_MAP0=5        <- BPF_MAP_ID_PROTO (id 0)
  BPF_MAP2=7        <- BPF_MAP_ID_DROPS (id 2)

  # ls -la /proc/self/fd
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map

The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.

To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.

Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.

Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-27 16:39:23 -07:00
Nicolas Dichtel afa5158f02 tc: fix compilation warning on 32bits arch
The warning was:
m_simple.c: In function ‘parse_simple’:
m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ [-Wformat]

Useful to be able to compile with -Werror.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-27 11:41:46 -07:00
Vadim Kochan 46679bbbe8 tc util: Fix possible buffer overflow when print class id
Use correct handle buffer length.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-20 10:06:02 -07:00
Felix Fietkau b8d5c9a71b tc: add support for connmark action
Add ability to add the netfilter connmark support.

Typical usage:
...lets tag outgoing icmp with mark 0x10..
iptables -tmangle -A PREROUTING -p icmp -j CONNMARK --set-mark 0x10
..add on ingress of $ETH an extractor for connmark...
tc filter add dev $ETH parent ffff: prio 4 protocol ip \
u32 match ip protocol 1 0xff \
flowid 1:1 \
action connmark continue
...if the connmark was 0x11, we police to a ridic rate of 10Kbps
tc filter add dev $ETH parent ffff: prio 5 protocol ip \
handle 0x11 fw flowid 1:1 \
action police rate 10kbit burst 10k

Other ways to use the connmark is to supply the zone, index and
branching choice. Refer to help.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-04-13 10:49:45 -07:00
Daniel Borkmann 6256f8c9e4 tc, bpf: finalize eBPF support for cls and act front-end
This work finalizes both eBPF front-ends for the classifier and action
part in tc, it allows for custom ELF section selection, a simplified tc
command frontend (while keeping compat), reusing of common maps between
classifier and actions residing in the same object file, and exporting
of all map fds to an eBPF agent for handing off further control in user
space.

It also adds an extensive example of how eBPF can be used, and a minimal
self-contained example agent that dumps map data. The example is well
documented and hopefully provides a good starting point into programming
cls_bpf and act_bpf.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-04-10 13:31:19 -07:00
Stephen Hemminger bd733e4088 Merge branch 'master' into net-next
Conflicts:
	man/man8/ip-route.8.in
2015-04-07 08:56:14 -07:00
Vadim Kochan 8b90a9907e tc class: Ignore if default class name file does not exist
If '-nm' specified that do not fail if there is no
default class names file in /etc/iproute2.

Changed default class name file cls_names -> tc_cls.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-07 08:31:56 -07:00
Daniel Borkmann 11c39b5e98 tc: add eBPF support to f_bpf
This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf:
add initial eBPF support for programmable classifiers").

A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via
LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc
then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file
descriptors) out of its dedicated sections, and via bpf(2) into the kernel
and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for
annotations, currently, I've used the file name for that, so that the user
can easily identify his filter when dumping configurations back.

Example usage:

  clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
  tc filter add dev em1 parent 1: bpf run object-file cls.o classid x:y

  tc filter show dev em1 [...]
  filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o

I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c
as my next step is to also add the same support for BPF action, so we can
have a fully fledged eBPF classifier and action in tc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-03-24 15:45:23 -07:00
Daniel Borkmann 51cf36756c tc: m_bpf: fix next arg selection after tc opcode
Next argument after the tc opcode/verdict is optional, using NEXT_ARG()
requires to have another argument after that one otherwise tc will bail
out. Therefore, we need to advance to the next argument manually as done
elsewhere.

Fixes: 86ab59a666 ("tc: add support for BPF based actions")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@resnulli.us>
2015-03-24 15:39:53 -07:00
Vadim Kochan 4612d04d6b tc class: Show class names from file
It is possible to use class names from file /etc/iproute2/cls_names
which tc will use when showing class info:

    # tc/tc -nm class show dev lo
	class htb 1:10 parent 1:1 leaf 10: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
	class htb 1:1 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb web#1:20 parent 1:1 leaf 20: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:2 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:30 parent 1:1 leaf 30: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb voip#1:40 parent 1:2 leaf 40: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
	class htb 1:50 parent 1:2 leaf 50: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:60 parent 1:2 leaf 60: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

or to specify via file path:

    # tc/tc -nm -cf /tmp/cls_names class show dev lo

Class names file contains simple "maj:min  name" structure:

1:20    web
1:40    voip

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:27:40 -07:00
Daniel Borkmann 32caee9fc7 m_bpf: remove unrelevant help lines
Left-overs when copying this over from cls_bpf. ;) Lets remove them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
2015-02-27 19:00:51 -08:00
Jiri Pirko 86ab59a666 tc: add support for BPF based actions
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Jiri Pirko 1d129d191a tc: push bpf common code into separate file
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Jamal Hadi Salim 564663b4ca actions: Get vlan action to work in pipeline
When specified in a graph such as:
action vlan ... action foobar
the vlan action chewed more than it can swallow

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-01-13 17:22:44 -08:00
Vadim Kochan 67e1d73be1 tc: Allow to easy change network namespace
Added new '-netns' option to simplify executing following cmd:

    ip netns exec NETNS tc OPTIONS COMMAND OBJECT

    to

    tc -n[etns] NETNS OPTIONS COMMAND OBJECT

e.g.:

    tc -net vnet0 qdisc

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-27 10:22:34 -08:00
Vadim Kochan d954b34a1f tc class: Show classes as ASCII graph
Added new '-g[raph]' option which shows classes in the graph view.

Meanwhile only generic stats info output is supported.

e.g.:

$ tc/tc -g class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

$ tc/tc -g -s class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |          rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    |     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    |     rate 0bit 0pps backlog 0b 0p requeues 0
|    |    |
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |               rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|               rate 0bit 0pps backlog 0b 0p requeues 0
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |    rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
                Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
                rate 0bit 0pps backlog 0b 0p requeues 0

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-27 10:16:51 -08:00
Stephen Hemminger 5c2c10b17e Merge branch 'net-next' 2014-12-24 12:23:00 -08:00
Stephen Hemminger 3d0b7439df whitespace cleanup
Remove all trailing whitespace and space before tabs.
2014-12-20 15:47:17 -08:00
Stephen Hemminger c9b8aef6ae Merge branch 'master' into net-next 2014-12-09 16:33:59 -08:00
Stephen Hemminger b2e116d6c3 tc: minor spelling fixes 2014-12-03 19:28:34 -08:00
Jiri Pirko 8b1c0216d8 tc: add support for vlan tc action
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Cong Wang <cwang@twopensource.com>
2014-12-03 09:29:21 -08:00
Stephen Hemminger edd3979272 emp: fix warning on deprecated bison directive
emp_ematch.y:12.1-13: warning: deprecated directive, use ‘%name-prefix’ [-Wdeprecated]
 %name-prefix="ematch_"
 ^^^^^^^^^^^^^
2014-10-09 08:31:10 -07:00
Jamal Hadi Salim 863ecb04b4 discourage use of direct policer interface
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:57 -07:00
Jamal Hadi Salim 287bf3a990 route classifier support for multiple actions
route can now use the action syntax

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:57 -07:00
Jamal Hadi Salim 08139c2ffb tcindex classifier support for multiple actions
tcindex can now use the action syntax

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:56 -07:00
Andy Furniss a07c6d6135 add missing underscore to man page and example nf_mark ematch
The man page and the "fail" example are missing an underscore in the
nf_mark ematch.

eg.

tc filter add dev eth0 parent ffff:  basic match 'meta(nfmark gt 24)'
classid 2:4

meta: unknown meta id

... >>meta(nfmark gt 24)<< ...
... meta(>>nfmark<< gt 24)...
Usage: meta(OBJECT { eq | lt | gt } OBJECT)
where: OBJECT  := { META_ID | VALUE }
        META_ID := id [ shift SHIFT ] [ mask MASK ]

Example: meta(nfmark gt 24)
          meta(indev shift 1 eq "ppp")
          meta(tcindex mask 0xf0 eq 0xf0)

For a list of meta identifiers, use meta(list).
Illegal "ematch"

meta(list) does correctly show nf_mark and the above test works with
nf_mark.

Signed-off-by: Andy Furniss adf.lists@gmail.com
2014-10-09 08:24:00 -07:00
Jamal Hadi Salim 10f5a375ea rsvp classifier support for multiple actions
Example setup:

sudo tc qdisc del dev eth0 root handle 1:0 prio
sudo tc qdisc add dev eth0 root handle 1:0 prio

sudo tc filter add dev eth0 pref 10 proto ip parent 1:0 \
rsvp session 10.0.0.1 ipproto icmp \
classid 1:1  \
action police rate 1kbit burst 90k pipe \
action ok

tc -s filter show dev eth0 parent 1:0

filter protocol ip pref 10 rsvp
filter protocol ip pref 10 rsvp fh 0x0001100a flowid 1:1 session
10.0.0.1 ipproto icmp
        action order 1:  police 0x5 rate 1Kbit burst 23440b mtu 2Kb
action pipe overhead 0b
ref 1 bind 1
        Action statistics:
        Sent 98000 bytes 1000 pkt (dropped 0, overlimits 761 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: gact action pass
         random type none pass val 0
         index 2 ref 1 bind 1 installed 60 sec used 3 sec
        Action statistics:
        Sent 74578 bytes 761 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: John Fastabend <john.r.fastabend@intel.com>
2014-09-29 08:47:33 -07:00
Jamal Hadi Salim 954de6c72b actions: BugFix action stats to display with -s
Was broken by commit 288abf513f
Lets not be too clever and have a separate call to print flushed
actions info.

Broken looks like:
root@moja-1:~# tc actions add  action drop index 4
root@moja-1:~# tc -s actions ls action gact

    action order 0: gact action drop
     random type none pass val 0
     index 4 ref 1 bind 0 installed 9 sec used 4 sec

The fixed version looks like:
    action order 0: gact action drop
     random type none pass val 0
     index 4 ref 1 bind 0 installed 9 sec used 4 sec
         Sent 108948 bytes 1297 pkts (dropped 1297, overlimits 0)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-09-29 08:47:19 -07:00
Jay Vosburgh 3757185b29 tc/netem: loss gemodel options fixes
First, the default value for 1-k is documented as being 0, but is
currently being set to 1. (100%).  This causes all packets to be dropped
in the good state if 1-k is not explicitly specified.  Fix this by setting
the default to 0.

	Second, the 1-h option is parsed correctly, however, the kernel is
expecting "h", not 1-h.  Fix this by inverting the "1-h" percentage before
sending to and after receiving from the kernel.  This does change the
behavior, but makes it consistent with the netem documentation and the
literature on the Gilbert-Elliot model, which refer to "1-h" and "1-k,"
not "h" or "k" directly.

	Last, fix a minor formatting issue for the options reporting.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
2014-08-04 10:15:10 -07:00
Yang Yingliang aeb199d5ce fq: allow options of fair queue set to ~0U
Some options of fair queue cannot be (~0U). It leads to maxrate
cannot be reset to unlimited because it cannot be (~0U). Allow
the options being ~0U.

Tested by the following command:
 # tc qdisc add dev eth4 root handle 1: fq limit 2000 flow_limit 200 maxrate 100mbit quantum 2000 initial_quantum 1600
 # tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 2000p flow_limit 200p buckets 1024 quantum 2000 initial_quantum 1600 maxrate 100Mbit
 Sent 1492 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  1 flows (0 inactive, 0 throttled)
  0 gc, 0 highprio, 0 throttled

 # tc qdisc change dev eth4 root handle 1: fq limit 4294967295 flow_limit 4294967295 maxrate 34359738360 quantum 4294967295 initial_quantum 4294967295
 # tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 4294967295p flow_limit 4294967295p buckets 1024 quantum 4294967295 initial_quantum 4294967295
 Sent 38372 bytes 216 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  2 flows (1 inactive, 0 throttled)
  0 gc, 2 highprio, 7 throttled

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
2014-06-09 12:42:36 -07:00
Sergey V. Lobanov 3ff10e82c1 Fixed 'tc qdisc show' for tbf when latency<0
When limit<burst latency becomes <0, for example:
 # tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit
 # tc qdisc show
 qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s

If latency<0 there is no reason to show it. Limit will be printed instead of
latency when latency<0:
 # tc qdisc show
 qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
2014-05-28 17:08:16 -07:00
Jamal Hadi Salim 288abf513f actions: correctly report the number of actions flushed
This also fixes a long standing bug of not sanely reporting the
action chain ordering

Sample scenario test

on window 1(event window):
run "tc monitor" and observe events

on window 2:
sudo tc actions add action drop index 10
sudo tc actions add action ok index 12
sudo tc actions ls action gact
sudo tc actions flush action gact

See the event window reporting two entries
(doing another listing should show empty generic actions)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:31 -07:00
Jamal Hadi Salim 9282d08d93 actions: keyword flowid or classid terminates action pipeline
scenario testcase:

TC="sudo ./tc/tc"
DEV="dev eth0"
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 action police rate 6Mbit burst 6Mbit drop flowid :1
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 action police rate 1Gbit burst 1Gbit pass flowid :1
$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 flowid 1:1 action police rate 6Mbit burst 6Mbit drop
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 flowid 1:2 action police rate 1Gbit burst 1Gbit pass

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
flowid 1:10 \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0 \
flowid 1:10

$TC -s filter ls $DEV parent ffff: protocol ip

Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:28 -07:00
Jamal Hadi Salim cacba03b10 Remove unnecessary debug statement
Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:26 -07:00
Natanael Copa dd9cc0ee81 iproute2: various header include fixes for compiling with musl libc
We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and
sys/select for struct timeval.

This fixes the following compile errors with musl libc:

f_bpf.c: In function 'bpf_parse_opt':
f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
...

tc_util.o: In function `print_tcstats2_attr':
tc_util.c:(.text+0x13fe): undefined reference to `MIN'
tc_util.c:(.text+0x1465): undefined reference to `MIN'
tc_util.c:(.text+0x14ce): undefined reference to `MIN'
tc_util.c:(.text+0x154c): undefined reference to `MIN'
tc_util.c:(.text+0x160a): undefined reference to `MIN'
tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow
...

tc_stab.o: In function `print_size_table':
tc_stab.c:(.text+0x40f): undefined reference to `MIN'
...

fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function)
        (vni >> 24) || vni == ULONG_MAX)
                              ^

lnstat.h:28:17: error: field 'last_read' has incomplete type
  struct timeval last_read;  /* last time of read */
                 ^

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
2014-05-28 16:51:39 -07:00
Andreas Greve 6e2e5ec28b fix print_ipt: segfault if more then one filter with action -j MARK.
BUG: tc filter show ... produce a segmentation fault if more than one
filter rule with action -j MARK exists.

Reason: In print_ipt(...) xtables will be initialzed with a
pointer to the static struct tcipt_globals at xtables_init_all().
Later on the fields .opts and .options_offset of tcipt_globals are
modified. The call of xtables_free_opts(1) at the end of print(...)
does not restore the original values of tcipt_globals for the
modified fields. It only frees some allocated memory and sets
.opts to NULL. This leads to a segmentation fault when print_ipt()
is called for the next filter rule with action -j MARK.

Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and
use it instead of tcipt_globals, so tcipt_globals will be not
modified.

Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>
2014-05-13 13:10:31 -07:00
Terry Lam ac74bd2a71 support for Heavy Hitter Filter (HHF) qdisc
$tc qdisc add dev eth0 hhf help
Usage: ... hhf [ limit PACKETS ] [ quantum BYTES]
               [ hh_limit NUMBER ]
               [ reset_timeout TIME ]
               [ admit_bytes BYTES ]
               [ evict_timeout TIME ]
               [ non_hh_weight NUMBER ]

$tc -s -d qdisc show dev eth0
qdisc hhf 8005: root refcnt 32 limit 1000p quantum 1514 hh_limit 2048
reset_timeout 40.0ms admit_bytes 131072 evict_timeout 1.0s non_hh_weight 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
    drop_overlimit 0 hh_overlimit 0 tot_hh 0 cur_hh 0

HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight:  DRR weight for mice (default 2)

Signed-off-by: Terry Lam <vtlam@google.com>
2014-05-09 12:10:47 -07:00
Jay Vosburgh 8f9672af7a tc/netem: fix loss state display and p14 parsing
The display of the entire netem loss state is shown as if it
were gemodel state, as the loss state information is assigned to the
wrong pointer.  Correct this by assigning the loss state to the correct
pointer.

	Additionally, attempting to set netem loss state will result in
random values in the p14 state probability because the option value
passed to the kernel by tc netem is not parsed or initialized.  Fix this
by supplying a default value of 0 for p14 and parsing the p14 value if
one is supplied.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
2014-05-09 12:06:58 -07:00
Hiroaki SHIMODA 4d4da09e00 htb: Move direct_qlen code part to htb_parse_opt().
The direct_qlen command option is used with qdisc operation.
It happened to be implemented in htb_parse_class_opt() which is called
with class operation.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
2014-03-21 14:20:06 -07:00
WANG Cong 1c9af05071 pedit: do not print debugging information by default
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2014-02-10 14:43:52 -08:00
Yang Yingliang dad2f72bef netem: add 64bit rates support
netem support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
2014-01-20 12:32:15 -08:00
Yang Yingliang a01de0a336 tbf: support sending burst/mtu to kernel directly
To avoid loss when transforming burst to buffer in userspace, send
burst/mtu to kernel directly.

Kernel commit 2e04ad424b("sch_tbf: add TBF_BURST/TBF_PBURST attribute")
make it can handle burst/mtu.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
2014-01-20 12:32:14 -08:00
Vijay Subramanian 80dd880dd0 PIE: Proportional Integral controller Enhanced
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
2014-01-09 22:50:47 -08:00
Stephen Hemminger ef056b2190 Merge branch 'master' into net-next-for-3.13 2014-01-09 22:44:17 -08:00
Jamal Hadi Salim f24a7e7205 dont skip action order
attached.

cheers,
jamal
commit 58d78f9f6447df324cdeb99262442c5e3f1f924b
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sun Dec 22 10:34:18 2013 -0500

    dont skip displaying of action chains or lists by TCA_ACT_MAX_PRIO

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim b159a7f1ae allow batch gets of actions
Attached.

cheers,
jamal
commit c5f30cabef14c951596210b96bc9b423b0d39592
Author: Jamal Hadi Salim <hadi@mojatatu.com>
Date:   Sun Dec 22 10:24:17 2013 -0500

    Allow batching of action gets
    Example:
    ----
    tc actions get \
    action gact index 100 \
    action gact index 4
    ----

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 352f6f97be simple print newline
attached.

cheers,
jamal
commit d7869e6167c3553e93e254940b0647032b40fed8
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sun Dec 22 07:46:28 2013 -0500

    print new line at the end for aesthetics

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 4bfb21ca20 policer - retire old syntax
attached.

cheers,
jamal
commit b82057d9ec851a8aba8a295b959190ef5098f330
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sat Dec 21 17:00:11 2013 -0500

    After a decade of trying to deprecate the old policer syntax,
    I believe it is time to kill it. The kernel build option for old
    policer is gone for at least 5 years now (although backward
    compatibility is still there). Being backward compatible meant
    hijacking the keyword "action" and was obstructing policies like:

    tc filter add dev eth0 parent ffff: protocol ip pref 10 \
    u32 match ip protocol 1 0xff flowid 1:10 \
    action skbedit mark 1 \
    action police rate 10kbit burst 10k pipe \
    action skbedit mark 2 \
    action police rate 20kbit burst 20k pipe \
    action action mirred egress mirror dev dummy0

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 02b1d345b7 skbedit print missing metadata
skbedit should print the index and other generic metadata info

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim 64b7db4db7 skbedit to default to pipe
Allow skbedit to be used as is in an action chain by default
without need to specify pipe

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Stephen Hemminger 4d98ab00de Fix FSF address in file headers 2013-12-06 15:05:07 -08:00
Eric Dumazet 8cecdc2837 tc: more user friendly rates
Display more user friendly rates.

10Mbit is more readable than 10000Kbit

Before :
class htb 1:2 root prio 0 rate 10000Kbit ceil 10000Kbit ...

After:
class htb 1:2 root prio 0 rate 10Mbit ceil 10Mbit ...

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-12-02 23:48:11 -08:00
Yang Yingliang ddc6243e9a tbf: add 64bit rates support
tbf support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc tbf 1: root refcnt 2 rate 40000Mbit burst 230000b peakrate 50000Mbit minburst 87500b lat 50.0ms

This is a followup to ("htb: support 64bit rates").

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Eric Dumazet <edumazet@google.com>
2013-12-02 23:46:56 -08:00
Eric Dumazet 8334bb325d htb: support 64bit rates
Starting from linux-3.13, we can break the 32bit limitation of
rates on HTB qdisc/classes.

Prior limit was 34.359.738.360 bits per second.

lpq83:~# tc -s qdisc show dev lo ; tc -s class show dev lo
qdisc htb 1: root refcnt 2 r2q 2000 default 1 direct_packets_stat 0 direct_qlen 6000
 Sent 6591936144493 bytes 149549182 pkt (dropped 0, overlimits 213757419 requeues 0)
 rate 39464Mbit 114938pps backlog 0b 15p requeues 0
class htb 1:1 root prio 0 rate 50000Mbit ceil 50000Mbit burst 200000b cburst 0b
 Sent 6591942184547 bytes 149549310 pkt (dropped 0, overlimits 0 requeues 0)
 rate 39464Mbit 114938pps backlog 0b 15p requeues 0
 lended: 149549310 borrowed: 0 giants: 0
 tokens: 336 ctokens: -164

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-11-22 17:36:18 -08:00
Daniel Borkmann d05df6861f tc: add cls_bpf frontend
This is the iproute2 part of the kernel patch "net: sched:
add BPF-based traffic classifier".

[Will re-submit later again for iproute2 when window for
 -next submissions opens.]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
2013-10-30 16:45:05 -07:00
Nigel Kukard 9bea14ff6b Fix tc stats when using -batch mode
There are two global variables in tc/tc_class.c:
__u32 filter_qdisc;
__u32 filter_classid;

These are not re-initialized for each line received in -batch mode:
class show dev eth0 parent 1: classid 1:1
class show dev eth0 parent 1: classid 1:1
Error: duplicate "classid": "1:1" is the second value.

This patch fixes the issue by initializing the two globals when we
enter print_class().

Signed-off-by: Nigel Kukard <nkukard@lbsd.net>
2013-10-30 16:37:07 -07:00
Stephen Hemminger 734c0ca2ca htb: remove old unused duplicate qdisc name
Alexey had htb2 as name for version in ancient code.
2013-10-27 12:28:38 -07:00
Stephen Hemminger 0a502b21e3 Fix handling of qdis without options
Some qdisc like htb want the parse_qopt to be called even if no options
present. Fixes regression caused by:

e9e78b0db0 is the first bad commit
commit e9e78b0db0
Author: Stephen Hemminger <stephen@networkplumber.org>
Date:   Mon Aug 26 08:41:19 2013 -0700

    tc: allow qdisc without options
2013-10-27 12:26:47 -07:00
Jamal Hadi Salim e26520e5c1 action: typo nat fix
If you taketh you giveth.
I Went the LinuxWay and copied this for m_simple.c and noticed
this one typo (I wonder where it came from?;->).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-09-30 21:31:40 -07:00
Jamal Hadi Salim 087f46ee4e tc: introduce simple action
Simple action is already in the kernel for years now as an
example. This complements it with user space control.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-09-30 21:29:34 -07:00
Stephen Hemminger af60cf40c9 Merge branch 'net-next-3.11' 2013-09-23 13:16:48 -07:00
Eric Dumazet b43f331828 htb: add support for direct_qlen attribute
TCA_HTB_DIRECT_QLEN attribute is supported since linux-3.10

HTB classes use an internal pfifo queue, which limit was not reported
by tc, and value inherited from device tx_queue_len at setup time.

With this patch, tc displays the value and can change it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:48:13 -07:00
Eric Dumazet 8f7574edd8 tc: support TCA_STATS_RATE_EST64
Since linux-3.11, rate estimator can provide TCA_STATS_RATE_EST64
when rate (bytes per second) is above 2^32 (~34 Mbits)

Change tc to use this attribute for high rates.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:46:33 -07:00
Eric Dumazet bc113e46a3 pkt_sched: fq: Fair Queue packet scheduler
Support for FQ packet scheduler

$ tc qd add dev eth0 root fq help
Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
              [ quantum BYTES ] [ initial_quantum BYTES ]
              [ maxrate RATE  ] [ buckets NUMBER ]
              [ [no]pacing ]

$ tc -s -d qd
qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 256 quantum 3028 initial_quantum 15140
 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
 backlog 0b 0p requeues 14
  511 flows (511 inactive, 0 throttled)
  110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit

limit	: max number of packets on whole Qdisc (default 10000)

flow_limit : max number of packets per flow (default 100)

quantum : the max deficit per RR round (default is 2 MTU)

initial_quantum : initial credit for new flows (default is 10 MTU)

maxrate : max per flow rate (default : unlimited)

buckets : number of RB trees (default : 1024) in hash table.
               (consumes 8 bytes per bucket)

[no]pacing : disable/enable pacing (default is enable)

Usage :

tc qdisc add dev $ETH root fq

tc qdisc del dev $ETH root 2>/dev/null
tc qdisc add dev $ETH root handle 1: mq
for i in `seq 1 4`
do
  tc qdisc add dev $ETH parent 1:$i est 1sec 4sec fq
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:43:40 -07:00
Jesper Dangaard Brouer 3e92ff522a linklayer interface between kernel and tc/userspace
This iproute2 tc patch is connected to the kernel
 - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)

The rate table calculated by tc, have gotten replaced in the kernel
and is no-longer used for lookups.

This happened in kernel release v3.8 caused by kernel
 - commit 56b765b79 ("htb: improved accuracy at high rates").
This change unfortunately caused breakage of tc overhead and
linklayer parameters.

 Kernel overhead handling got fixed in kernel v3.10 by
 - commit 01cb71d2d47 (net_sched: restore "overhead xxx" handling)

 Kernel linklayer handling got fixed in kernel v3.11 by
 - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)

The linklayer fix introduced a struct change, that allow the linklayer
attribute to be transferred between tc and kernel. This patch make use
of this linklayer attribute.

The linklayer setting is transfer to the kernel.  And linklayer
setting received from the kernel is printed with a prefixed
"linklayer" when listing current configuration.  The default
TC_LINKLAYER_ETHERNET is only printed in detailed output mode.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
2013-09-03 08:21:24 -07:00
Stephen Hemminger e9e78b0db0 tc: allow qdisc without options
Pfifo_fast needs no options. So don't force it to have parsing code.
2013-08-26 08:41:19 -07:00
Stephen Hemminger b8a45897b9 More minor spelling fixes 2013-08-04 15:10:05 -07:00
Stephen Hemminger a3aa47a559 Make tc and ip batch mode consistent
Change the code for tc and ip so that batch mode is handled
the same.
2013-07-16 10:04:05 -07:00
Eric Dumazet a303853e84 get_rate: detect 32bit overflows
On Mon, 2013-06-03 at 16:36 +0100, Ben Hutchings wrote:

> Oops, I read this as being strtol() currently, not strtod().  Currently
> '1.5gbit' will work, but this change will break that.  So I think you
> need to keep bps as a double.

Arg

> Then here I think the check should be *rate != floor(bps), i.e. accept
> rounding down of a non-integer number of bytes but any other change is
> assumed to be overflow.

Thanks Ben, here is v4 then ;)

[PATCH v4] get_rate: detect 32bit overflows

Current rate limit is 34.359.738.360 bit per second, and
unfortunately 40Gbps links are above it.

overflows in get_rate() are currently not detected, and some
users are confused. Let's detect this and complain.

Note that some qdisc are ready to get extended range, but this will
need additional attributes and new iproute2

With help from Ben Hutchings

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
2013-06-07 09:24:56 -07:00
Stephen Hemminger 22fa92e367 htb: fix indentation
iproute2 uses kernel style indenting
2013-06-07 08:54:45 -07:00
Eric Dumazet 44f1ff0afc htb: report overhead attribute
"tc class show dev ..." omits the overhead attribute for HTB.

After patch I have :

tc class add dev $DEV parent 1: classid 1:1 est 1sec 4sec htb \
    rate 12Mbit mtu 1500 quantum 1514 overhead 20

tc class show dev $DEV
class htb 1:1 root prio 0 rate 12000Kbit overhead 20 ceil 12000Kbit
burst 1500b cburst 1500b

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-06-07 08:53:53 -07:00
Alexander Duyck cfa292defa iproute2: act_ipt fix xtables breakage on older versions.
In trying to build on a RHEL6.3 I ran into several build issues that are
addressed in this patch.

The first is that xtables_merge_options only has 3 parameters.  It appears
this is how this code was originally.  As such for the case where the version
is less than 6 I am assuming it would be correct to maintain the original
setup that only had 3 parameters being passed instead of 4.

I also ran into an issue with the define for __ALIGN_KERNEL not being present.
I believe this may be due to the fact that __ALIGN_KERNEL was moved into a
separate header from ALIGN after the UAPI changes.  In order to just cover all
of the bases I have moved the main definition for the macros into
__ALIGN_KERNEL_MASK and __ALIGN_KERNEL and if ALIGN is also needed then it is
just a direct redefine to __ALIGN_KERNEL.

Cc: Hasan Chowdhury <shemonc@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-05-01 08:01:47 -07:00
Stephen Hemminger e7b24b67db Fix build when shared libraries are disabled
On some platforms, shared libraries are not used. The stub code
need some updating to not generate errors.
2013-03-13 08:29:59 -07:00
Kees van Reeuwijk 3bed7bb7e7 iproute2: clearer error messages for fifo and tbf qdiscs
Clearer error messages for fifo and tbf qdiscs:
- Say who is complaining
- Don't just say a parameter is bad, show the offending parameter
- Be clearer about duplicate parameters vs illegal pairs of parameters
- Try to give multiple error messages rather than let the user discover the errors one by one
- When there are parameter aliases, try to use the variant that was used, or at least mention them all

Note that in the old version an empty parameter list to tbf would just cause an explain() message
without a specific error message. By simply removing the relevant error check, the code now
handles this error more gracefully by printing an error message for all mandatory parameters.
It still prints the explain() message.

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>
2013-02-21 08:34:34 -08:00
Stephen Hemminger d1f28cf181 ip: make local functions static 2013-02-12 11:38:35 -08:00
Benjamin Poirier 5ab3a4de5e Use pkg-config to obtain xtables.h path
On openSUSE 12.2 (at least) xtables.h is not installed in the system-wide
include dir but in /usr/include/iptables-1.4.16.3/. This results in the
following build failure:
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Other includers of xtables.h already call out to pkg-config
2013-02-11 09:19:54 -08:00
Johannes Naab e72ca3fbb0 iproute2: tc netem rate: allow negative packet/cell overhead
by fixing the parsing of command-line tokens

Signed-off-by: Johannes Naab <jn@stusta.de>
2013-02-04 09:06:50 -08:00
Jamal Hadi Salim 852d51222d iproute2: act_ipt fix xtables breakage
Fixes breakage with xtables API starting with version 1.4.10

Signed-off-by: Hasan Chowdhury <shemonc@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-01-16 08:14:48 -08:00
Strake 5bd9dd49ae include needed files
Needed to build iproute2 with musl
2012-12-23 11:49:06 -08:00
Mike Frysinger e4fc4ada33 allow pkg-config to be customized
Rather than hard coding `pkg-config`, use ${PKG_CONFIG} so people can
override it to their specific version (like when cross-compiling).

This is the same way the upstream pkg-config code works.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2012-11-11 16:21:34 -08:00
Matt Burgess 92905c6e0d iproute2-3.6.0 assumes presence of iptables
Hi,

When compiling iproute2-3.6.0 on a host that doesn't have iptables available, I get the following error:

gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES
-DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE
-DCONFIG_GACT -DCONFIG_GACT_PROB -DYY_NO_INPUT   -c -o em_ipset.o
em_ipset.c
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Fixed by the following patch, which guards the building of em_ipset.o on
the presence of suitable headers.

Thanks,

Matt.
2012-10-03 08:51:29 -07:00
Rostislav Lisovy 7b5f30e14f Ematch used to classify CAN frames according to their identifiers
This ematch enables effective filtering of CAN frames (AF_CAN) based
on CAN identifiers with masking of compared bits. Implementation
utilizes bitmap based classification for standard frame format (SFF)
which is optimized for minimal overhead.

Signed-off-by: Rostislav Lisovy <lisovy@gmail.com>
2012-08-20 13:11:55 -07:00
Dan Kenigsberg f1675d615b utils: invarg: msg precedes the faulty arg
fix all call which reversed the arg order.

Signed-off-by: Dan Kenigsberg <danken@redhat.com>
2012-08-17 13:35:36 -07:00
Florian Westphal 8194411a42 tc: add ipset ematch
example usage:
tc filter add dev $dev parent $id: basic match not ipset'(foobar src)' ..

also updates iproute2/ematch_map, else tc complains:
Error: Unable to find ematch "ipset" in /etc/iproute2/ematch_map
Please assign a unique ID to the ematch kind the suggested entry is:
        8       ipset

when trying to use this ematch.

(text ematch (5) only exists in kernel, a vlan ematch (6) exists neither in
 kernel nor userspace, but kernel headers define TCF_EM_VLAN == 6).
2012-08-13 08:33:50 -07:00
Li Wei 6cef544b96 tc: man: change man page and comment to confirm to code's behavior.
Since the get_rate() code incorrectly interpreted bare number, the
behavior is not the same as man page and comment described.

We need to change the man page and comment for compatible with the
existing usage by scripts.
2012-07-12 09:05:28 -07:00
Li Wei 424adc19bf tc: filter: validate filter priority in userspace.
Because we use the high 16 bits of tcm_info to pass prio value to
kernel, thus it's range would be [0, 0xffff], without validation
in tc when user pass a lager(>65535) priority, the actual priority
set in kernel would confuse the user.

So, add a validation to ensure prio in the range.
2012-07-10 15:39:30 -07:00
Hiroaki SHIMODA 690b11f4a6 tc: u32: Fix firstfrag filter.
On current firstfrag filter, all non fragmented packets are matched.
firstfrag should check MF bit.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
2012-07-10 15:39:02 -07:00
Hiroaki SHIMODA 1d62f99fe2 tc: u32: Fix icmp_code off.
The off of icmp_code is not 20 but 21. Also offmask should be 0 unless
nexthdr+ is specified.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
2012-07-10 15:39:02 -07:00
Li Wei 3c4f545633 tc: prio: Perform more strict check on priomap.
Since band number counts from zero thus band must be little than
opt.bands.
2012-06-18 12:25:08 -07:00
Vijay Subramanian 50a3ec3c46 tc-codel: Update usage text
codel can take 'noecn' as an option. This also makes it consistent with the
manpage.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-24 15:02:05 -07:00
Eric Dumazet c3524efc14 fq_codel: Fair Queue Codel AQM
Fair Queue Codel packet scheduler

Principles :

- Packets are classified (internal classifier or external) on flows.
- This is a Stochastic model (as we use a hash, several flows might
                              be hashed on same slot)
- Each flow has a CoDel managed queue.
- Flows are linked onto two (Round Robin) lists,
  so that new flows have priority on old ones.

- For a given flow, packets are not reordered (CoDel uses a FIFO)
- head drops only.
- ECN capability is on by default.
- Very low memory footprint (64 bytes per flow)

tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                      [ target TIME ] [ interval TIME ] [ noecn ]
                      [ quantum BYTES ]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Changli Gao <xiaosuo@gmail.com>
2012-05-22 14:17:49 -07:00
Eric Dumazet 185d88f99b tc_codel: Controlled Delay AQM
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.

http://queue.acm.org/detail.cfm?id=2209336

This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.

As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :

target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)

Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.

Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
                          [ interval TIME ] [ ecn ]

qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
 Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
 rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
  count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
  maxpacket 1514 ecn_mark 84399 drop_overlimit 0

CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.

One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
2012-05-22 14:13:52 -07:00
Vijay Subramanian 1070205dc0 tc-netem: Add support for ECN packet marking
This patch provides support for marking packets with ECN instead of
dropping them with netem. This makes it possible to make use of the
netem ECN marking feature that was added recently to the kernel.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-22 14:10:21 -07:00
Christoph J. Thompson 5c434a9e5a iproute2 - Fix up and simplify variables pointing to install directories
Define where is the are located the iproute2 config files.
Get rid of trailing slashes for paths in several file.

Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>
2012-04-12 09:49:10 -07:00
Stephen Hemminger ff24746cca Convert to use rta_getattr_ functions
User new functions (inspired by libmnl) to do type safe access
of routeing attributes
2012-04-10 08:47:55 -07:00