Commit Graph

98 Commits

Author SHA1 Message Date
Stephen Hemminger 5f1df307b4 config: put CFLAGS/LDLIBS in config.mk
This renames Config to config.mk and includes more Make input.
Now configure generates all the required CFLAGS and LDLIBS for
the optional libraries.

Also, use pkg-config to test for libelf, rather than using a test
program. This makes it consistent with other libraries.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-23 10:03:09 -07:00
Daniel Borkmann 8cc360fe48 bpf: unbreak libelf linkage for bpf obj loader
Commit 69fed534a5 ("change how Config is used in Makefile's") moved
HAVE_MNL specific CFLAGS/LDLIBS for building with libmnl out of the
top level Makefile into sub-Makefiles. However, it also removed the
HAVE_ELF specific CFLAGS/LDLIBS entirely, which breaks the BPF object
loader for tc and ip with "No ELF library support compiled in." despite
having libelf detected in configure script. Fix it similarly as in
69fed534a5 for HAVE_ELF.

Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Reported-by: Jeffrey Panneman <jeffrey.panneman@tno.nl>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-08-10 16:40:02 -07:00
Stephen Hemminger 6ff66acc60 tc, ip: more Makefile updates for LIBMNL
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-09 08:38:51 -07:00
Roman Mashak cba134ae70 tc: fix Makefile to build skbmod
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-05-22 13:33:51 -07:00
Amir Vadai f3e1b2448a pedit: Introduce ipv6 support
Add support for modifying IPv6 headers using pedit.

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-15 15:05:20 -07:00
Amir Vadai 3cd5149ecd tc/pedit: p_eth: ETH header editor
For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      eth dst set 11:22:33:44:55:66 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>
2017-05-01 09:22:16 -07:00
Jiri Kosina be67f81297 iproute2: tc: introduce build dependency on libnetlink
Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong
(especially so for builds where libnetlink.a gets statically linked into
tc). Fix that by introducing an explicit dependency.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-24 15:11:32 -08:00
Yotam Gigi 0b1abd84fb tc: Add support for the sample tc action
The sample tc action allows sampling packets matching a classifier. It
peeks randomly packets, and samples them using the psample netlink
channel. The user can specify the psample group, which the packet will be
sampled to, the sampling rate and the packet truncation (to save
kernel-user traffic).

The sampled packets contain informative metadata, for example, the input
interface and the original packet length.

The action syntax:
tc filter add [...] \
	action sample rate <RATE> group <GROUP> [trunc <SIZE>]
	[...]

Where:
  RATE := The sampling rate which is the ratio of packets observed at the
	  data source to the samples generated
  GROUP := the psample module sampling group
  SIZE := optional truncation size

An example for a common usecase of the sample tc action: to sample ingress
traffic from interface eth1, one may use the commands:

tc qdisc add dev eth1 handle ffff: ingress

tc filter add dev eth1 parent ffff: \
       matchall action sample rate 12 group 4

Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets
on dev eth1 to psample group 4.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
2017-02-06 14:24:52 -08:00
David Michael bb18c98198 tc: make tc linking depend on libtc.a
There was a race condition where the command to link the tc binary
could (rarely) run before the libtc.a archive existed.
2017-01-09 12:06:58 -08:00
Amir Vadai d57639a475 tc/act_tunnel: Introduce ip tunnel action
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
2016-12-02 14:12:09 -08:00
Daniel Borkmann e42256699c bpf: make tc's bpf loader generic and move into lib
This work moves the bpf loader into the iproute2 library and reworks
the tc specific parts into generic code. It's useful as we can then
more easily support new program types by just having the same ELF
loader backend. Joint work with Thomas Graf. I hacked a rough start
of a test suite to make sure nothing breaks [1] and looks all good.

  [1] https://github.com/borkmann/clsact/blob/master/test_bpf.sh

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
2016-11-29 12:35:32 -08:00
Daniel Borkmann 4710e46ec3 tc, ipt: don't enforce iproute2 dependency on iptables-devel
Since 5cd1adba79 ("Update to current iptables headers") compilation
of iproute2 broke for systems without iptables-devel package [1].
Reason is that even though we fall back to build m_ipt.c, the include
depends on a xtables-version.h header, which only ships with
iptables-devel. Machines not having this package fail compilation with:

    [...]
    CC       m_ipt.o
In file included from ../include/iptables.h:5:0,
                 from m_ipt.c:17:
../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or directory
compilation terminated.
../Config:31: recipe for target 'm_ipt.o' failed
make[1]: *** [m_ipt.o] Error 1

The configure script only barks that package xtables was not found in
the pkg-config search path. The generated Config then only contains f.e.
TC_CONFIG_IPSET. In tc's Makefile we thus fall back to adding m_ipt.o
to TCMODULES. m_ipt.c then includes the local include/iptables.h header
copy, which includes the include/xtables.h copy. Latter then includes
xtables-version.h, which only ships with iptables-devel.

One way to resolve this is to skip this whole mess when pkg-config has
no xtables config available. I've carried something along these lines
locally for a while now, but it's just too annyoing. :/ Build works fine
now also when xtables.pc is not available.

  [1] http://www.spinics.net/lists/netdev/msg366162.html

Fixes: 5cd1adba79 ("Update to current iptables headers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-10-26 10:58:22 -07:00
Yotam Gigi d5cbf3ff05 tc: Add support for the matchall traffic classifier.
The matchall classifier matches every packet and allows the user to apply
actions on it. In addition, it supports the skip_sw and skip_hw (as can
be found on u32 and flower filter) that direct the kernel to skip the
software/hardware processing of the actions.

This filter is very useful in usecases where every packet should be
matched. For example, packet mirroring (SPAN) can be setup very easily
using that filter.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-09-01 08:37:01 -07:00
David Ahern 57bdf8b764 Make builds default to quiet mode
Similar to the Linux kernel and perf add infrastructure to reduce the
amount of output tossed to a user during a build. Full build output
can be obtained with 'make V=1'

Builds go from:

make[1]: Leaving directory `/home/dsa/iproute2.git/lib'
make[1]: Entering directory `/home/dsa/iproute2.git/ip'
gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE    -c -o ip.o ip.c
gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE    -c -o ipaddress.o ipaddress.c

to:

...
    AR       libutil.a

ip
    CC       ip.o
    CC       ipaddress.o
...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-05-31 12:13:07 -07:00
Jamal Hadi Salim d3e511223f tc: introduce IFE action
This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
   "Distributing Linux Traffic Control Classifier-Action Subsystem"
    Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-04.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
	action order 1: ife decode action reclassify type 0x0
	 allow mark allow prio
	 index 11 ref 1 bind 1 installed 45 sec used 45 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:    ethertype 0xdead
xxx:    add skb->mark to whitelist of metadatum to send
xxx:    rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 118 success 0)
  match c0a87a00/ffffff00 at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )
	action order 1:  skbedit mark 17
	 index 11 ref 1 bind 1 installed 3 sec used 3 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe type 0xDEAD
	 allow mark dst 02:15:15:15:15:15
	 index 12 ref 1 bind 1 installed 3 sec used 3 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:13:26 -07:00
Daniel Borkmann 8f9afdd531 tc, clsact: add clsact frontend
Add the tc part for the kernel commit 1f211a1b929c ("net, sched: add
clsact qdisc"). Quoting example usage from that commit description:

  Example, adding qdisc:

  # tc qdisc add dev foo clsact
  # tc qdisc show dev foo
  qdisc mq 0: root
  qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc clsact ffff: parent ffff:fff1

  Adding filters (deleting, etc works analogous by specifying ingress/egress):

  # tc filter add dev foo ingress bpf da obj bar.o sec ingress
  # tc filter add dev foo egress  bpf da obj bar.o sec egress
  # tc filter show dev foo ingress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
  # tc filter show dev foo egress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

The ingress parent alias can also be used with ingress qdisc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-01-18 11:41:27 -08:00
Jiri Pirko 30eb304ecd tc: add support for Flower classifier
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-05-21 15:22:49 -07:00
Daniel Borkmann 4bd624467b tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e4 ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.

File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:

  # env | grep BPF
  BPF_NUM_MAPS=3
  BPF_MAP1=6        <- BPF_MAP_ID_QUEUE (id 1)
  BPF_MAP0=5        <- BPF_MAP_ID_PROTO (id 0)
  BPF_MAP2=7        <- BPF_MAP_ID_DROPS (id 2)

  # ls -la /proc/self/fd
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map

The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.

To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.

Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.

Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-27 16:39:23 -07:00
Felix Fietkau b8d5c9a71b tc: add support for connmark action
Add ability to add the netfilter connmark support.

Typical usage:
...lets tag outgoing icmp with mark 0x10..
iptables -tmangle -A PREROUTING -p icmp -j CONNMARK --set-mark 0x10
..add on ingress of $ETH an extractor for connmark...
tc filter add dev $ETH parent ffff: prio 4 protocol ip \
u32 match ip protocol 1 0xff \
flowid 1:1 \
action connmark continue
...if the connmark was 0x11, we police to a ridic rate of 10Kbps
tc filter add dev $ETH parent ffff: prio 5 protocol ip \
handle 0x11 fw flowid 1:1 \
action police rate 10kbit burst 10k

Other ways to use the connmark is to supply the zone, index and
branching choice. Refer to help.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-04-13 10:49:45 -07:00
Daniel Borkmann 11c39b5e98 tc: add eBPF support to f_bpf
This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf:
add initial eBPF support for programmable classifiers").

A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via
LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc
then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file
descriptors) out of its dedicated sections, and via bpf(2) into the kernel
and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for
annotations, currently, I've used the file name for that, so that the user
can easily identify his filter when dumping configurations back.

Example usage:

  clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
  tc filter add dev em1 parent 1: bpf run object-file cls.o classid x:y

  tc filter show dev em1 [...]
  filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o

I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c
as my next step is to also add the same support for BPF action, so we can
have a fully fledged eBPF classifier and action in tc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-03-24 15:45:23 -07:00
Jiri Pirko 86ab59a666 tc: add support for BPF based actions
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Jiri Pirko 1d129d191a tc: push bpf common code into separate file
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Vadim Kochan 67e1d73be1 tc: Allow to easy change network namespace
Added new '-netns' option to simplify executing following cmd:

    ip netns exec NETNS tc OPTIONS COMMAND OBJECT

    to

    tc -n[etns] NETNS OPTIONS COMMAND OBJECT

e.g.:

    tc -net vnet0 qdisc

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-27 10:22:34 -08:00
Jiri Pirko 8b1c0216d8 tc: add support for vlan tc action
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Cong Wang <cwang@twopensource.com>
2014-12-03 09:29:21 -08:00
Terry Lam ac74bd2a71 support for Heavy Hitter Filter (HHF) qdisc
$tc qdisc add dev eth0 hhf help
Usage: ... hhf [ limit PACKETS ] [ quantum BYTES]
               [ hh_limit NUMBER ]
               [ reset_timeout TIME ]
               [ admit_bytes BYTES ]
               [ evict_timeout TIME ]
               [ non_hh_weight NUMBER ]

$tc -s -d qdisc show dev eth0
qdisc hhf 8005: root refcnt 32 limit 1000p quantum 1514 hh_limit 2048
reset_timeout 40.0ms admit_bytes 131072 evict_timeout 1.0s non_hh_weight 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
    drop_overlimit 0 hh_overlimit 0 tot_hh 0 cur_hh 0

HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight:  DRR weight for mice (default 2)

Signed-off-by: Terry Lam <vtlam@google.com>
2014-05-09 12:10:47 -07:00
Vijay Subramanian 80dd880dd0 PIE: Proportional Integral controller Enhanced
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
2014-01-09 22:50:47 -08:00
Daniel Borkmann d05df6861f tc: add cls_bpf frontend
This is the iproute2 part of the kernel patch "net: sched:
add BPF-based traffic classifier".

[Will re-submit later again for iproute2 when window for
 -next submissions opens.]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
2013-10-30 16:45:05 -07:00
Jamal Hadi Salim 087f46ee4e tc: introduce simple action
Simple action is already in the kernel for years now as an
example. This complements it with user space control.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-09-30 21:29:34 -07:00
Eric Dumazet bc113e46a3 pkt_sched: fq: Fair Queue packet scheduler
Support for FQ packet scheduler

$ tc qd add dev eth0 root fq help
Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
              [ quantum BYTES ] [ initial_quantum BYTES ]
              [ maxrate RATE  ] [ buckets NUMBER ]
              [ [no]pacing ]

$ tc -s -d qd
qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 256 quantum 3028 initial_quantum 15140
 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
 backlog 0b 0p requeues 14
  511 flows (511 inactive, 0 throttled)
  110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit

limit	: max number of packets on whole Qdisc (default 10000)

flow_limit : max number of packets per flow (default 100)

quantum : the max deficit per RR round (default is 2 MTU)

initial_quantum : initial credit for new flows (default is 10 MTU)

maxrate : max per flow rate (default : unlimited)

buckets : number of RB trees (default : 1024) in hash table.
               (consumes 8 bytes per bucket)

[no]pacing : disable/enable pacing (default is enable)

Usage :

tc qdisc add dev $ETH root fq

tc qdisc del dev $ETH root 2>/dev/null
tc qdisc add dev $ETH root handle 1: mq
for i in `seq 1 4`
do
  tc qdisc add dev $ETH parent 1:$i est 1sec 4sec fq
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:43:40 -07:00
Benjamin Poirier 5ab3a4de5e Use pkg-config to obtain xtables.h path
On openSUSE 12.2 (at least) xtables.h is not installed in the system-wide
include dir but in /usr/include/iptables-1.4.16.3/. This results in the
following build failure:
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Other includers of xtables.h already call out to pkg-config
2013-02-11 09:19:54 -08:00
Mike Frysinger e4fc4ada33 allow pkg-config to be customized
Rather than hard coding `pkg-config`, use ${PKG_CONFIG} so people can
override it to their specific version (like when cross-compiling).

This is the same way the upstream pkg-config code works.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2012-11-11 16:21:34 -08:00
Matt Burgess 92905c6e0d iproute2-3.6.0 assumes presence of iptables
Hi,

When compiling iproute2-3.6.0 on a host that doesn't have iptables available, I get the following error:

gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES
-DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE
-DCONFIG_GACT -DCONFIG_GACT_PROB -DYY_NO_INPUT   -c -o em_ipset.o
em_ipset.c
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Fixed by the following patch, which guards the building of em_ipset.o on
the presence of suitable headers.

Thanks,

Matt.
2012-10-03 08:51:29 -07:00
Rostislav Lisovy 7b5f30e14f Ematch used to classify CAN frames according to their identifiers
This ematch enables effective filtering of CAN frames (AF_CAN) based
on CAN identifiers with masking of compared bits. Implementation
utilizes bitmap based classification for standard frame format (SFF)
which is optimized for minimal overhead.

Signed-off-by: Rostislav Lisovy <lisovy@gmail.com>
2012-08-20 13:11:55 -07:00
Florian Westphal 8194411a42 tc: add ipset ematch
example usage:
tc filter add dev $dev parent $id: basic match not ipset'(foobar src)' ..

also updates iproute2/ematch_map, else tc complains:
Error: Unable to find ematch "ipset" in /etc/iproute2/ematch_map
Please assign a unique ID to the ematch kind the suggested entry is:
        8       ipset

when trying to use this ematch.

(text ematch (5) only exists in kernel, a vlan ematch (6) exists neither in
 kernel nor userspace, but kernel headers define TCF_EM_VLAN == 6).
2012-08-13 08:33:50 -07:00
Eric Dumazet c3524efc14 fq_codel: Fair Queue Codel AQM
Fair Queue Codel packet scheduler

Principles :

- Packets are classified (internal classifier or external) on flows.
- This is a Stochastic model (as we use a hash, several flows might
                              be hashed on same slot)
- Each flow has a CoDel managed queue.
- Flows are linked onto two (Round Robin) lists,
  so that new flows have priority on old ones.

- For a given flow, packets are not reordered (CoDel uses a FIFO)
- head drops only.
- ECN capability is on by default.
- Very low memory footprint (64 bytes per flow)

tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                      [ target TIME ] [ interval TIME ] [ noecn ]
                      [ quantum BYTES ]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Changli Gao <xiaosuo@gmail.com>
2012-05-22 14:17:49 -07:00
Eric Dumazet 185d88f99b tc_codel: Controlled Delay AQM
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.

http://queue.acm.org/detail.cfm?id=2209336

This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.

As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :

target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)

Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.

Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
                          [ interval TIME ] [ ecn ]

qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
 Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
 rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
  count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
  maxpacket 1514 ecn_mark 84399 drop_overlimit 0

CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.

One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
2012-05-22 14:13:52 -07:00
Christoph J. Thompson 5c434a9e5a iproute2 - Fix up and simplify variables pointing to install directories
Define where is the are located the iproute2 config files.
Get rid of trailing slashes for paths in several file.

Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>
2012-04-12 09:49:10 -07:00
Yegor Yefremov 8ced4fcd50 iproute2: cleanup dependencies
LIBNETLINK will be defined in the main Makefile, so
both ../lib/libnetlink.a ../lib/libutil.a will be
automatically appended during linking. Otherwise
../lib/libnetlink.a ../lib/libutil.a will appear
twice during linking.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
2012-02-27 08:27:54 -08:00
Jan Engelhardt d7aa57d450 iproute2: proper detection of libxtables position and flags
Upstream: not sent yet

Any tests involving iptables _MUST_ utilize pkg-config to find the
proper locations of the installation.
2012-01-03 15:05:25 -08:00
Stephen Hemminger 155ad8023b ematch: fix warning about unused input()
Use existing compile flag to indicate that input() is not used
by tc ematch, fixes compiler warning.
2012-01-03 13:55:59 -08:00
Stephen Hemminger 93ba481acb cleanup ematch yacc files
make clean needs to remove all the yacc output files for ematch.
2011-11-02 16:39:36 -07:00
Mike Frysinger aa48b5931a tc: fix parallel build file with lex/yacc
Building iproute2 in parallel might hit the race failure:
	emp_ematch.l:2:30: fatal error: emp_ematch.yacc.h:
		No such file or directory
	make[1]: *** [emp_ematch.lex.o] Error 1

This is because we currently allow the yacc/lex files to generate and
compile in parallel.  So add a simple dependency to make sure yacc has
finished before we attempt to compile the lex output.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-10-18 15:02:21 -07:00
Stephen Hemminger c441bd4c1b Add QFQ scheduler
Basic configuration support for QFQ.
Still need to add manual page.
2011-07-13 13:46:34 -07:00
John Fastabend 914953046a iproute2: tc add mqprio qdisc support
Add mqprio qdisc support. Output matches the following,

qdisc mq 0: dev eth1 root
qdisc mq 0: dev eth2 root
qdisc mqprio 8001: dev eth3 root  tc 8 map 0 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1
             queues:(0:7) (8:15) (16:23) (24:31) (32:39) (40:47) (48:55) (56:63)

And usage is,

Usage: ... mclass [num_tc NUMBER] [map P0 P1...]
                  [offset txq0 txq1 ...] [count cnt0 cnt1 ...] [hw 1|0]

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
2011-04-12 14:28:19 -07:00
Juliusz Chroboczek d7f3299d59 tc : SFB flow scheduler
Supports SFB qdisc (included in linux-2.6.39)

1) Setup phase : accept non default parameters

2) dump information

qdisc sfb 11: parent 1:11 limit 1 max 25 target 20
  increment 0.00050 decrement 0.00005 penalty rate 10 burst 20 (600000ms 60000ms)
 Sent 47991616 bytes 521648 pkt (dropped 549245, overlimits 549245 requeues 0)
 rate 7193Kbit 9774pps backlog 0b 0p requeues 0
  earlydrop 0 penaltydrop 0 bucketdrop 0 queuedrop 549245 childdrop 0 marked 0
  maxqlen 0 maxprob 0.00000 avgprob 0.00000

Signed-off-by: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-12 14:27:37 -07:00
Stephen Hemminger a4eca97cff CHOKe scheduler
TC commands for CHOKe qdisc
2011-01-31 09:09:50 -08:00
Gregoire Baron 3822cc986c tc: add ACT_CSUM action support (csum)
Add the iproute2 support for the ACT_CSUM action. Can be used as
following, certainly in conjunction with the ACT_PEDIT action (pedit):

 # In order to DNAT (stateless) IPv4 packet from 192.168.1.100 to
 #  0x12345678 (18.52.86.120), and update the IPv4 header checksum and
 #  the UDP checksum (the last one, only if the packet is UDP).
tc filter add eth0 prio 1 protocol ip parent ffff: \
  u32 match ip src 192.168.1.100/32 flowid :1 \
    action pedit munge offset 16 u32 set 0x12345678 \
      pipe csum ip and udp

 # In order to alter destination address of IPv6 TCP packets from fc00::1
 #  and correct the TCP checksum (nothing happened? except maybe for
 #  checksums in the TCP payload ...).
tc filter add eth0 prio 1 protocol ipv6 parent ffff: \
  u32 match ip6 src fc00::1/128 match ip6 protocol 0x06 0xff flowid :1 \
    action pedit munge offset 24 u32 set 0x12345678 \
      pipe csum tcp
2010-12-01 11:17:46 -08:00
Mike Frysinger bf512683e0 tc: revert "echo" in install target
The recent commit "iproute2: add option to build m_xt as a tc module"
(ab814d6355) looks like it wrongly included debug changes in the
install target.  So drop the `echo` so the tc binary actually gets
installed again.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-07-23 12:28:25 -07:00
Andreas Henriksson ab814d6355 iproute2: add option to build m_xt as a tc module (v3)
This will build the xt module (action ipt) of tc as a
shared object that is linked at runtime by tc if used,
rather then built into tc.

This is similar to how the atm qdisc support
is handled (q_atm.so).

Signed-off-by: Andreas Henriksson <andreas@xxxxxxxx>
2010-04-12 11:40:29 -07:00
Andreas Henriksson 12ddfff76c iproute2: detect iptables modules dir in configure.
Try to automatically detect iptables modules directory.

Make the configure script look for iptables modules.
This also makes it possible to specify it on the
command line while building via "make IPT_LIB_DIR=/foo/bar".

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2010-03-29 15:10:20 -07:00