since all tc classifiers are required to specify ethertype as part of grammar
By not allowing eth_type to be specified we remove contradiction for
example when a user specifies:
tc filter add ... priority xxx protocol ip flower eth_type ipv6
This patch removes that contradiction
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
gcc < 4.6 does not handle C11 syntax for the static initialization of
anonymous struct/union, hence the following error:
tc_bpf.c:260: error: unknown field map_type specified in initializer
Signed-off-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Add missing spaces around operators to increase readability. Aside from
that, make "preference" match a real synonym for "tos" and "dsfield" as
it's effect was identical to them.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This fixes a few syntax errors and changes route filter help text to use
classid instead of flowid to be consistent with other filters' help
texts.
Signed-off-by: Phil Sutter <phil@nwl.cc>
After the patch, the most minimal command to load an eBPF action
for late binding with auto index selection through tc is:
tc actions add action bpf obj prog.o
We already set TC_ACT_PIPE in tc as default opcode, so if nothing
further has been specified, just use it. Also, allow "ok" next to
"pass" for matching cmdline on TC_ACT_OK.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When having optional classid, most minimal command can be sth
like:
tc filter add dev foo parent X: bpf obj prog.o
Therefore, adapt the code so that a next argument will not be
enforced as the case currently.
Also, minor cleanup on the classid, where we should rather
have used addattr32(), and add flags for exec configuration,
for example (using short notation):
tc filter add dev foo parent X: bpf da obj prog.o
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
linux-3.19 fq packet scheduler got a new attribute, controlling
number of 'flows' holding packets not attached to a socket
(forwarding usage)
kernel commit is 06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7
("pkt_sched: fq: better control of DDOS traffic")
This patch adds corresponding code to tc command.
tc qd replace dev eth0 root fq orphan_mask 511
Signed-off-by: Eric Dumazet <edumazet@google.com>
Code to parse and export this tuneable via netlink is already present in
sched_fq.c of the kernel, so not making it accessible for users would be
a waste of resources.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This patch follows the changes of commit 4d98ab0 ("Fix FSF address in
file headers"), fixing file headers added after it.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Frontend support for kernel commit a5c90b29e5cc ("act_bpf: properly
support late binding of bpf action to a classifier").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Error was:
f_bpf.o: In function `bpf_parse_opt':
f_bpf.c:(.text+0x88f): undefined reference to `secure_getenv'
m_bpf.o: In function `parse_bpf':
m_bpf.c:(.text+0x587): undefined reference to `secure_getenv'
collect2: error: ld returned 1 exit status
There is no special reason to use the secure version of getenv, thus let's
simply use getenv().
CC: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 88eea53954 ("tc: {f,m}_bpf: allow to retrieve uds path from env")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Kernel commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
bpf programs") added support for tail calls, this patch here adds tc
front end parts for the object parser to prepopulate a given eBPF prog
array before the root prog is pushed down for classifier creation. The
prepopulation works with any number of prog arrays in any dependencies,
e.g. prog or normal maps could also be used from progs that are
tail-called themself, etc.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The initializers are simply not needed.
These if-blocks are outright dead code, because '0 > unsigned' is always
false, so only else clause triggers and regardless of which clause triggers
it only updates 'ind' which is later unconditionally written to before
being used anyway.
Otherwise we get errors from clang:
m_pedit.c:166:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
if (0 > tkey->off) {
~ ^ ~~~~~~~~~
m_pedit.c:209:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
if (0 > tkey->off) {
~ ^ ~~~~~~~~~
2 errors generated.
Change-Id: I3c9e9092915088fc56f992e5df736851541a4458
The for loop should only probe up to G[i]bit rates, so that we
end up with T[i]bit as the last max units[] slot for snprintf(3),
and not possibly an invalid pointer in case rate is multiple of
kilo.
Fixes: 8cecdc2837 ("tc: more user friendly rates")
Reported-by: Jose R. Guzman Mosqueda <jose.r.guzman.mosqueda@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There have been several instances where response from kernel
has overrun the stack buffer from the caller. Avoid future problems
by passing a size argument.
Also drop the unused peer and group arguments to rtnl_talk.
Allow the qdisc limit to be set, which is particularly useful when
the default VQ is not configured with RED parameters.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
codel & fq_codel packet schedulers are now able to have a threshold
for CE marking packets, regardless of the drop/nodrop decision taken by
CoDel.
This is particularly useful for dctcp and variants, that do not use
traditional ECN.
Note that fq_codel users would have to specify noecn if ce_threshold is
used, otherwise results would be not very interesting, as ecn is default
on for fq_codel.
$ tc -s qdisc show dev eth1
qdisc codel 8002: root refcnt 45 limit 1000p target 5.0ms ce_threshold
1.0ms interval 100.0ms
Sent 4908469888317 bytes 3351813967 pkt (dropped 0, overlimits 0
requeues 21624365)
rate 37671Mbit 3231836pps backlog 4904740b 250p requeues 21624365
count 0 lastcount 0 ldelay 1.1ms drop_next 0us
maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 410861803
Signed-off-by: Eric Dumazet <edumazet@google.com>
In the GRED kernel source code, both of the terms "drop parameters"
(DP) and "virtual queue" (VQ) are used to refer to the same thing.
Each "DP" is better understood as a "set of drop parameters", since
it has values for limit, min, max, avpkt, etc. This terminology can
result in confusion when creating a GRED qdisc having multiple DPs.
Netlink attributes and struct members with the DP name seem to have
been left intact for compatibility, while the term VQ was otherwise
adopted in the code, which is more intuitive.
Use the VQ term in the tc command syntax and output (but maintain
compatibility with the old syntax).
Rewrite the usage text to be concise and similar to other qdiscs.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
DPs, def_DP, and DP are unsigned values that are sent and received
in TCA_GRED_* netlink attributes; handle them properly when they
are parsed or printed. Use MAX_DPs as the initial value for def_DP
and DP, and fix the operator used for bounds checking them.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Make the output more consistent with the RED qdisc, and only show
details/statistics if the appropriate flag is set when calling tc.
Show the parameters used with "gred setup". Add missing statistics
"pdrop" and "other". Fix format specifiers for unsigned values.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
This is more helpful to the user, since the command takes two forms,
and the message that would otherwise appear about missing parameters
assumes one of those forms.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
The "bandwidth" parameter is optional, but ensure the user is aware
of its default value, to proactively avoid configuration problems.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
It is used when parsing three different parameters, only one of
which is Wlog. Change the name to make the code less confusing.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
When deleting a specific basic filter with handle,
tc command always ignores the 'handle' option, so
tcm_handle is always 0 and kernel deletes all filters
in the selected group. This is wrong, we should respect
'handle' in cmdline.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Currently, only on error we get a log dump, but I found it useful when
working with eBPF to have an option to also dump the log on success.
Also spotted a typo in a header comment, which is fixed here as well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
This work follows upon commit 6256f8c9e4 ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
The warning was:
m_simple.c: In function ‘parse_simple’:
m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ [-Wformat]
Useful to be able to compile with -Werror.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Add ability to add the netfilter connmark support.
Typical usage:
...lets tag outgoing icmp with mark 0x10..
iptables -tmangle -A PREROUTING -p icmp -j CONNMARK --set-mark 0x10
..add on ingress of $ETH an extractor for connmark...
tc filter add dev $ETH parent ffff: prio 4 protocol ip \
u32 match ip protocol 1 0xff \
flowid 1:1 \
action connmark continue
...if the connmark was 0x11, we police to a ridic rate of 10Kbps
tc filter add dev $ETH parent ffff: prio 5 protocol ip \
handle 0x11 fw flowid 1:1 \
action police rate 10kbit burst 10k
Other ways to use the connmark is to supply the zone, index and
branching choice. Refer to help.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
This work finalizes both eBPF front-ends for the classifier and action
part in tc, it allows for custom ELF section selection, a simplified tc
command frontend (while keeping compat), reusing of common maps between
classifier and actions residing in the same object file, and exporting
of all map fds to an eBPF agent for handing off further control in user
space.
It also adds an extensive example of how eBPF can be used, and a minimal
self-contained example agent that dumps map data. The example is well
documented and hopefully provides a good starting point into programming
cls_bpf and act_bpf.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
If '-nm' specified that do not fail if there is no
default class names file in /etc/iproute2.
Changed default class name file cls_names -> tc_cls.
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf:
add initial eBPF support for programmable classifiers").
A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via
LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc
then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file
descriptors) out of its dedicated sections, and via bpf(2) into the kernel
and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for
annotations, currently, I've used the file name for that, so that the user
can easily identify his filter when dumping configurations back.
Example usage:
clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
tc filter add dev em1 parent 1: bpf run object-file cls.o classid x:y
tc filter show dev em1 [...]
filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o
I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c
as my next step is to also add the same support for BPF action, so we can
have a fully fledged eBPF classifier and action in tc.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>