Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the classid reservation scheme for hardware
traffic classes and offloads to route packets to a hardware
traffic class are accepted in net-next.
HW traffic classes 0 through 15 are represented using the
reserved classid values :ffe0 - :ffef.
Example:
Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent ffff:\
prio 1 flower dst_ip 192.168.1.1/32\
ip_proto udp dst_port 12000 skip_sw\
hw_tc 1
# tc filter show dev eth0 parent ffff:
filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 hw_tc 1
eth_type ipv4
ip_proto udp
dst_ip 192.168.1.1
dst_port 12000
skip_sw
in_hw
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).
The syntax is:
tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
hicredit <HICREDIT> sendslope <SENDSLOPE>
idleslope <IDLESLOPE>
(The order is not important)
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the tc/mqprio changes are accepted in net-next.
Adds new mqprio options for 'mode' and 'shaper'. The mode
option can take values for offload modes such as 'dcb' (default),
'channel' with the 'hw' option set to 1. The new 'channel' mode
supports offloading TCs and other queue configurations. The
'shaper' option is to support HW shapers ('dcb' default) and
takes the value 'bw_rlimit' for bandwidth rate limiting. The
parameters to the bw_rlimit shaper are minimum and maximum
bandwidth rates. New HW shapers in future can be supported
through the shaper attribute.
# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit
# tc qdisc show dev eth0
qdisc mqprio 804a: root tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
queues:(0:3) (4:7)
mode:channel
shaper:bw_rlimit min_rate:1Gbit 2Gbit max_rate:4Gbit 5Gbit
v2: Avoid buffer overrun and minor cleanup.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
This patch changes ife_prio to ife_tcindex which is right variable to
assign in the argument in this case.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.
With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.
With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.
We need to free answer after using.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Sample use case:
... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress
... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe
sudo $TC -s actions ls action police
action order 0: police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 1 bind 0 installed 41 sec used 41 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12
... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12
Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8
... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw (rule hit 20000 success 10000)
match 7f000008/ffffffff at 16 (success 10000 )
action order 1: police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 2 bind 1 installed 198 sec used 2 sec
Action statistics:
Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
backlog 0b 0p requeues 0
action order 2: skbedit mark 11 pass
index 11 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
action order 3: skbedit mark 12 pass
index 12 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The original problem was that something like:
| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);
might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.
Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This patch adds support to the iproute2 tc filter command for matching MPLS
labels in the flower classifier. The ability to match the Time To Live,
Bottom Of Stack, Traffic Control and Label fields are added as options to
the flower filter.
e.g.:
tc filter add dev eth0 protocol 0x8847 parent ffff: \
flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
action drop
Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Consolidate dump of prog info to use bpf_dump_prog_info() when possible.
Moving forward, we want to have a consistent output for BPF progs when
being dumped. E.g. in cls/act case we used to dump tag as a separate
netlink attribute before we had BPF_OBJ_GET_INFO_BY_FD bpf(2) command.
Move dumping tag into bpf_dump_prog_info() as well, and only dump the
netlink attribute for older kernels. Also, reuse bpf_dump_prog_info()
for XDP case, so we can dump tag and whether program was jited, which
we currently don't show.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Correct two errors which cancel each other out:
* Do not send twice the length of the actual provided by the user to the kernel
* Do not dump half the length of the cookie provided by the kernel
As the cookie is now stored in the kernel at its correct length rather
than double the that length cookies of up to the maximum size of 16 bytes
may now be stored rather than a maximum of half that length.
Output of dump is the same before and after this change,
but the data stored in the kernel is now exactly the cookie
rather than the cookie + as many trailing zeros.
Before:
# tc filter add dev eth0 protocol ip parent ffff: \
flower ip_proto udp action drop \
cookie 0123456789abcdef0123456789abcdef
RTNETLINK answers: Invalid argument
After:
# tc filter add dev eth0 protocol ip parent ffff: \
flower ip_proto udp action drop \
cookie 0123456789abcdef0123456789abcdef
# tc filter show dev eth0 ingress
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 1 sec used 1 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie len 16 0123456789abcdef0123456789abcdef
Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This patch will report about if the ethertype for IFE is not specified
that the default IFE type is used.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
This patch uses the usually IEEE format to display an ethertype which is
4-digits and every digit in upper case.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
This patch allows to set an ethertype for IFE which is zero. There is no
kernel side validation which forbids a type to zero.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
- Use strncpy() when writing to target->t->u.user.name and make sure the
final byte remains untouched (xtables_calloc() set it to zero).
- 'tname' length sanitization was completely wrong: If it's length
exceeded the 16 bytes available in 'k', passing a length value of 16
to strncpy() would overwrite the previously NULL'ed 'k[15]'. Also, the
sanitization has to happen if 'tname' is exactly 16 bytes long as
well.
Signed-off-by: Phil Sutter <phil@nwl.cc>
The later check for 'k[0] != 0' requires a non-empty filter name,
otherwise NULL pointer dereference in 'q' might happen.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Assuming 'opt' might be NULL, move the call to RTA_PAYLOAD to after the
check since it dereferences its parameter.
Signed-off-by: Phil Sutter <phil@nwl.cc>
This renames Config to config.mk and includes more Make input.
Now configure generates all the required CFLAGS and LDLIBS for
the optional libraries.
Also, use pkg-config to test for libelf, rather than using a test
program. This makes it consistent with other libraries.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
multiq_parse_opt() doesn't change 'opt' at all. So at least make sure
it doesn't fill TCA_OPTIONS attribute with garbage from stack.
Signed-off-by: Phil Sutter <phil@nwl.cc>
The use of 'ok' variable in parse_gact() is ineffective: The second
conditional increments it either if *argv is 'gact' or if
parse_action_control() doesn't fail (in which case exit() is called).
So this is effectively an unconditional increment and since no decrement
happens anywhere, all remaining checks for 'ok != 0' can be dropped.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Commit 69fed534a5 ("change how Config is used in Makefile's") moved
HAVE_MNL specific CFLAGS/LDLIBS for building with libmnl out of the
top level Makefile into sub-Makefiles. However, it also removed the
HAVE_ELF specific CFLAGS/LDLIBS entirely, which breaks the BPF object
loader for tc and ip with "No ELF library support compiled in." despite
having libelf detected in configure script. Fix it similarly as in
69fed534a5 for HAVE_ELF.
Fixes: 69fed534a5 ("change how Config is used in Makefile's")
Reported-by: Jeffrey Panneman <jeffrey.panneman@tno.nl>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
dump more than TCA_ACT_MAX_PRIO actions per batch when the kernel
supports it.
Introduced keyword "since" for time based filtering of actions.
Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400
go get some coffee and wait for > 120 seconds and try again:
prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0
Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2 success 1)
match 7f000002/ffffffff at 12 (success 1 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1145 sec used 802 sec
Action statistics:
Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
...
that coffee took long, no? It was good.
Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1
More details please:
prompt$ hackedtc -s actions ls action gact since 120000
action order 0: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1270 sec used 30 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
And the filter?
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 4 success 2)
match 7f000002/ffffffff at 12 (success 2 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1324 sec used 84 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
- CONTROL has to come last, otherwise 'index' applies to gact and not
simple itself.
- Man page wasn't updated to reflect syntax changes.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Make use of TCA_BPF_ID/TCA_ACT_BPF_ID that we exposed and print the ID
of the programs loaded and use the new BPF_OBJ_GET_INFO_BY_FD command
for dumping further information about the program, currently whether
the attached program is jited.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adding new tunnel key fields would cause the usage line overflow 80 chars.
Make the usage text similar to other commands.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
In case default control action parsing takes place, it is ok to miss.
So don't print error message.
Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Jiri Benc <jbenc@redhat.com>
parse_action_control helper does advancing of the arg inside. So don't
do it outside.
Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Allow users to set flower classifier filter rules which
include matches for ip tos and ttl.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
This happens with NAT targets, such as SNAT, DNAT and MASQUERADE. These
are still not usable with this patch, but at least tc doesn't crash
anymore when one tries to use them.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Each tc action is terminated by a control action. Each action parses and
prints then intividually. Introduce set of helpers and allow to share
this code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Fixes
| tc_core.c:190:29: error: 'UINT16_MAX' undeclared (first use in this function); did you mean '__INT16_MAX__'?
| if ((sz >> s->size_log) > UINT16_MAX) {
| ^~~~~~~~~~
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Do not allow using eth and udp header types if non-extended pedit kABI
is being used. Other protocol parsers already have this check.
Signed-off-by: Amir Vadai <amir@vadai.me>
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).
Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.
If none of these two flags are set, this signals running
over older kernel.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
For example, forward udp traffic destined to port 999 to veth0 and set
tcp port to 888:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto udp \
dst_port 999 \
action pedit ex munge \
udp dport set 888 \
action mirred egress \
redirect dev veth0
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
For example, forward tcp traffic destined to port 80 to veth0 and set
tcp port to 8080:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
dst_port 80 \
action pedit ex munge \
tcp dport set 8080 \
action mirred egress \
redirect dev veth0
Signed-off-by: Amir Vadai <amir@vadai.me>
For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
action pedit ex munge \
eth dst set 11:22:33:44:55:66 \
action mirred egress \
redirect dev veth0
Signed-off-by: Amir Vadai <amir@vadai.me>
Make parse_val() accept fields up to 128 bits long, this should be
enough for current use cases and involves a minimal change to code.
Signed-off-by: Amir Vadai <amir@vadai.me>
Enable user to edit IP header ttl field.
For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
action pedit ex munge \
ip ttl add 0xff pipe \
action mirred egress \
redirect dev veth0
Signed-off-by: Amir Vadai <amir@vadai.me>
Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated offset relative to the IPv4 header.
To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
flower \
ip_proto udp \
dst_port 80 \
action pedit ex munge \
ip dst set 1.1.1.1 \
pipe \
action mirred egress redirect dev veth0
Signed-off-by: Amir Vadai <amir@vadai.me>
Make use of 128b user cookies
Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.
Sample exercise(showing variable length use of cookie)
.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4
.. dump all gact actions..
sudo $TC -s actions ls action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4
.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking
kernel to perform 'full qdisc dump', as for historical reasons some of the
default qdiscs are being hidden by the kernel.
The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When built with GCC warnings enabled:
q_pie.c: In function ‘pie_parse_opt’:
q_pie.c:78:38: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
(alpha > ALPHA_MAX) || (alpha < ALPHA_MIN)) {
^
q_pie.c:85:35: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
(beta > BETA_MAX) || (beta < BETA_MIN)) {
^
This is because MIN is 0 and unsigned number can never be less than 0.
Therefore just remove the _MIN values.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong
(especially so for builds where libnetlink.a gets statically linked into
tc). Fix that by introducing an explicit dependency.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Use the new helper functions rta_getattr_u* instead of direct
cast of RTA_DATA(). Where RTA_DATA() is a structure, then remove
the unnecessary cast since RTA_DATA() is void *
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit
f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
it when filter/actions are shown.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fix order of arguments when passed to __flower_parse_ip_addr.
Fixes: ("f888f4e20534 tc: flower: Support matching ARP")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Extend ICMP code and type match to support masks.
Also add missing documentation to synopsis in manpage.
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 ip_proto icmpv6 type 128/240 code 0 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Provide generic masked u8 print helper and use it to print arp operations.
Also:
* Make name parameter of arp op print helper const.
* Consistently use __u8 rather than uint8_t, in keeping with the
pervasive style in the file.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Provide generic masked u8 paser helper and use it to parse arp operations.
Also consistently use __u8 rather than uint8_t, in keeping with the
pervasive style in the file.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Print the skip flags when we dump a filter.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Unlike other PREFIXes documented in the usage for tc flower, which accept
both IPv4 and IPv6 prefixes, arp_sip and arp_tip only accepts IPv4
prefixes.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Use enum flower_icmp_field rather than bool as type of third parameter
when calling flower_icmp_attr_type.
Fixes: eb3b5696f1 ("tc: flower: support matching on ICMP type and code")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The sample tc action allows sampling packets matching a classifier. It
peeks randomly packets, and samples them using the psample netlink
channel. The user can specify the psample group, which the packet will be
sampled to, the sampling rate and the packet truncation (to save
kernel-user traffic).
The sampled packets contain informative metadata, for example, the input
interface and the original packet length.
The action syntax:
tc filter add [...] \
action sample rate <RATE> group <GROUP> [trunc <SIZE>]
[...]
Where:
RATE := The sampling rate which is the ratio of packets observed at the
data source to the samples generated
GROUP := the psample module sampling group
SIZE := optional truncation size
An example for a common usecase of the sample tc action: to sample ingress
traffic from interface eth1, one may use the commands:
tc qdisc add dev eth1 handle ffff: ingress
tc filter add dev eth1 parent ffff: \
matchall action sample rate 12 group 4
Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets
on dev eth1 to psample group 4.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
v2 - update to address changes in 00697ca19a.
When using the tc flower filter, rules marked with "protocol all" do not
actually match all packets. This is due to a bug in f_flower.c that passes
in ETH_P_ALL in the TCA_FLOWER_KEY_ETH_TYPE attribute when adding a rule.
Fix this by omitting TCA_FLOWER_KEY_ETH_TYPE if the protocol is set to
ETH_P_ALL.
Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Instead of "magic numbers" we can now specify each flag
by name. Prefix of "no" (e.g nofrag) unsets the flag,
otherwise it wil be set.
Example:
# add a flower filter that will drop fragmented packets
tc filter add dev ens4f0 protocol ip parent ffff: \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags frag \
action drop
# add a flower filter that will drop non-fragmented packets
tc filter add dev ens4f0 protocol ip parent ffff: \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
ip_flags nofrag \
action drop
Fixes: 22a8f01989 ('tc: flower: support matching flags')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
addattr16 may return an error about the nl msg size
but not about incorrect eth type.
Fixes: 488b41d020 ("tc: flower no need to specify the ethertype")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
addattr32 may return an error.
Fixes: cfcabf18d8 ("tc: flower: Add skip_{hw|sw} support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
This fix a missing use case after the introduction of enum flower_endpoint.
Fixes: 6910d65661 ("tc: flower: introduce enum flower_endpoint")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Said iptables version introduced struct xtables_globals field
'compat_rev', a function pointer. Initializing it is mandatory as
libxtables calls it without existence check.
Without this, tc segfaults when using the xt action like so:
| tc filter add dev d0 parent ffff: u32 match u32 0 0 \
| action xt -j MARK --set-mark 20
Signed-off-by: Phil Sutter <phil@nwl.cc>
Since 41aa17ff46 ("tc/cls_flower: Add dest UDP port to tunnel params")
tc flower supports setting the dest UDP port.
* Use "port_number" to be consistent with other man-page text
* Re-add "enc_dst_port" documentation to manpage which was
accidently removed by b2a1f740aa ("tc: flower: document that *_ip
parameters take a PREFIX as an argument.")
Cc: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Enhance flower to support matching on flags.
The 1st flag allows to match on whether the packet is
an IP fragment.
Example:
# add a flower filter that will drop fragmented packets
# (bit 0 of control flags)
tc filter add dev ens4f0 protocol ip parent ffff: \
flower \
src_mac e4:1d:2d:fd:8b:01 \
dst_mac e4:1d:2d:fd:8b:02 \
indev ens4f0 \
matching_flags 0x1/0x1 \
action drop
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
This fixes under musl build issues like:
f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
if (h == LONG_MIN || h == LONG_MAX) {
^
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
tunnel key set parameters includes also dest UDP port, add it to the
usage.
Fixes: 449c709c38 ("tc/m_tunnel_key: Add dest UDP port to tunnel key action")
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reported-by: Simon Horman <simon.horman@netronome.com>
Encapsulation dest UDP port is part of the classifier matching
parameters, add it to the usage.
Fixes: 41aa17ff46 ("tc/cls_flower: Add dest UDP port to tunnel params")
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reported-by: Simon Horman <simon.horman@netronome.com>
* The argument to src_mac and dst_mac may now take an optional mask
to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
filters from the kernel.
Example of use of LLADDR with and without a mask:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
optional prefix length which is used to provide a mask to limit the scope
of matching.
* This is documented as a PREFIX in keeping with ip-route(8).
Example of uses of IPv4 and IPv6 prefixes
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 dst_ip 2001:DB8::1 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
* The argument to src_mac and dst_mac may now take an optional mask
to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
filters from the kernel.
Example of use of LLADDR with and without a mask:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
src_mac 52:54:00:00:00:00 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
optional prefix length which is used to provide a mask to limit the scope
of matching.
* This is documented as a PREFIX in keeping with ip-route(8).
Example of uses of IPv4 and IPv6 prefixes
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 dst_ip 2001:DB8::1 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Enhance tunnel key action parameters by adding destination UDP port.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Enhance IP tunnel parameters by adding destination UDP port.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Support matching on ICMP type and code.
Example usage:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
indev eth0 ip_proto icmp type 8 code 0 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 ip_proto icmpv6 type 128 code 0 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Introduce enum flower_endpoint and use it instead of a bool
as the type for paramatising source and destination.
This is intended to improve read-ability and provide some type
checking of endpoint parameters.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Make use of flower_port_attr_type() safe:
* flower_port_attr_type() may return a valid index into tb[] or -1.
Only access tb[] in the case of the former.
* Do not access null entries in tb[]
Also make usage silent - it is valid for ip_proto to be invalid,
for example if it is not specified as part of the filter.
Fixes: a1fb0d4842 ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.
The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.
For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:
$ tc filter add dev net0 protocol ip parent ffff: \
flower \
ip_proto 1 \
dst_ip 11.11.11.2 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0
Signed-off-by: Amir Vadai <amir@vadai.me>
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.
For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0':
$ tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 11.11.0.2 \
enc_dst_ip 11.11.0.1 \
enc_key_id 11 \
dst_ip 11.11.11.1 \
action mirred egress redirect dev vnet0
Signed-off-by: Amir Vadai <amir@vadai.me>
This work moves the bpf loader into the iproute2 library and reworks
the tc specific parts into generic code. It's useful as we can then
more easily support new program types by just having the same ELF
loader backend. Joint work with Thomas Graf. I hacked a rough start
of a test suite to make sure nothing breaks [1] and looks all good.
[1] https://github.com/borkmann/clsact/blob/master/test_bpf.sh
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Support matching on SCTP ports in the same way that matching
on TCP and UDP ports is already supported.
Example usage:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 ip_proto sctp dst_port 80 \
action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Remove left over usage from removal of eth_type argument.
Fixes: 488b41d020 ('tc: flower no need to specify the ethertype')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
So far, only the 'egress' direction was implemented.
Allow specifying 'ingress' as the direction packet appears on the target
interface.
For example, this takes incoming 802.1q frames on veth0 and redirects
them for input on dummy0:
# tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q basic \
action mirred ingress redirect dev dummy0
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Since 5cd1adba79 ("Update to current iptables headers") compilation
of iproute2 broke for systems without iptables-devel package [1].
Reason is that even though we fall back to build m_ipt.c, the include
depends on a xtables-version.h header, which only ships with
iptables-devel. Machines not having this package fail compilation with:
[...]
CC m_ipt.o
In file included from ../include/iptables.h:5:0,
from m_ipt.c:17:
../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or directory
compilation terminated.
../Config:31: recipe for target 'm_ipt.o' failed
make[1]: *** [m_ipt.o] Error 1
The configure script only barks that package xtables was not found in
the pkg-config search path. The generated Config then only contains f.e.
TC_CONFIG_IPSET. In tc's Makefile we thus fall back to adding m_ipt.o
to TCMODULES. m_ipt.c then includes the local include/iptables.h header
copy, which includes the include/xtables.h copy. Latter then includes
xtables-version.h, which only ships with iptables-devel.
One way to resolve this is to skip this whole mess when pkg-config has
no xtables config available. I've carried something along these lines
locally for a while now, but it's just too annyoing. :/ Build works fine
now also when xtables.pc is not available.
[1] http://www.spinics.net/lists/netdev/msg366162.html
Fixes: 5cd1adba79 ("Update to current iptables headers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add support for controling hardware offload using (now standard)
skip_sw and skip_hw flags in cls_bpf.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol ip \
u32 match u32 0 0 flowid 1:1 \
action ok
sudo $TC filter add dev $ETH parent ffff: prio 1 protocol ip \
u32 match ip protocol 1 0xff flowid 1:10 \
action ok
now dump to see all rules..
$TC -s filter ls dev $ETH parent ffff: protocol ip
....
filter pref 1 u32
filter pref 1 u32 fh 801: ht divisor 1
filter pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10 (rule hit 0 success 0)
match 00010000/00ff0000 at 8 (success 0 )
action order 1: gact action drop
random type none pass val 0
index 6 ref 1 bind 1 installed 4 sec used 4 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 336 success 336)
match 00000000/00000000 at 0 (success 336 )
action order 1: gact action pass
random type none pass val 0
index 5 ref 1 bind 1 installed 38 sec used 4 sec
Action statistics:
Sent 24864 bytes 336 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
....
..get filter 801::800
$TC -s filter get dev $ETH parent ffff: protocol ip \
handle 801:0:800 prio 2 u32
....
filter parent ffff: protocol ip pref 1 u32 fh 801::800 order 2048 key ht 801 bkt 0 flowid 1:10 (rule hit 260 success 130)
match 00010000/00ff0000 at 8 (success 130 )
action order 1: gact action drop
random type none pass val 0
index 6 ref 1 bind 1 installed 348 sec used 0 sec
Action statistics:
Sent 11440 bytes 130 pkt (dropped 130, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
....
..get other one
$TC -s filter get dev $ETH parent ffff: protocol ip \
handle 800:0:800 prio 2 u32
....
filter parent ffff: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 514 success 514)
match 00000000/00000000 at 0 (success 514 )
action order 1: gact action pass
random type none pass val 0
index 5 ref 1 bind 1 installed 506 sec used 4 sec
Action statistics:
Sent 35544 bytes 514 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
....
..try something that doesnt exist
$TC -s filter get dev $ETH parent ffff: protocol ip handle 800:0:803 prio 2 u32
.....
RTNETLINK answers: No such file or directory
We have an error talking to the kernel
.....
Note, added NLM_F_ECHO is for backward compatibility. old kernels never
before Eric's patch will not respond without it and newer kernels (after Erics patch)
will ignore it.
In old kernels there is a side effect:
In addition to a response to the GET you will receive an event (if you do tc mon).
But this is still better than what it was before (not working at all).
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
This action is intended to be an upgrade from a usability perspective
from pedit (as well as operational debugability).
Compare this:
sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action pedit munge offset -14 u8 set 0x02 \
munge offset -13 u8 set 0x15 \
munge offset -12 u8 set 0x15 \
munge offset -11 u8 set 0x15 \
munge offset -10 u16 set 0x1515 \
pipe
to:
sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod dmac 02:15:15:15:15:15
Or worse, try to debug a policy with destination mac, source mac and
etherype. Then make that a hundred rules and you'll get my point.
The most important ethernet use case at the moment is when redirecting or
mirroring packets to a remote machine. The dst mac address needs a re-write
so that it doesn't get dropped or confuse an interconnecting (learning) switch
or dropped by a target machine (which looks at the dst mac).
In the future common use cases on pedit can be migrated to this action
(as an example different fields in ip v4/6, transports like tcp/udp/sctp
etc). For this first cut, this allows modifying basic ethernet header.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
In linux-4.9 fq packet scheduler got a new stat :
unthrottle_latency in nano second units.
Gives a good indication of system load or timer implementation
latencies.
Signed-off-by: Eric Dumazet <edumazet@google.com>
The 'vlan modify' action allows to replace an existing 802.1q tag
according to user provided settings.
It accepts same arguments as the 'vlan push' action.
For example, this replaces vid 6 with vid 5:
# tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q \
basic match 'meta(vlan mask 0xfff eq 6)' \
action vlan modify id 5 continue
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Currently, 'linkid' input by the user is parsed but 'handle' is appended to the netlink message.
# tc filter add dev enp1s0f1 protocol ip parent ffff: prio 99 u32 ht 800: \
order 1 link 1: offset at 0 mask 0f00 shift 6 plus 0 eat match ip \
protocol 6 ff
resulted in:
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0
match 00060000/00ff0000 at 8
offset 0f00>>6 at 0 eat
This patch results in:
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 link 1:
match 00060000/00ff0000 at 8
offset 0f00>>6 at 0 eat
Signed-off-by Sushma Sitaram: Sushma Sitaram <sushma.sitaram@intel.com>
since get_qdisc_handle() truncates the input value to 16 bit, return an
error and prompt "invalid qdisc ID" in case input 'handle' parameter needs
more than 16 bit to be stored.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
The current vlan push action supports only vid and protocol options.
Add priority option.
Example script that adds vlan push action with vid and priority:
tc filter add dev veth0 protocol ip parent ffff: \
flower \
indev veth0 \
action vlan push id 100 priority 5
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Classification according to vlan id and vlan priority.
Example script that adds vlan filter:
# add ingress qdisc
tc qdisc add dev ens4f0 ingress
# add a flower filter with vlan id and priority classification
tc filter add dev ens4f0 protocol 802.1Q parent ffff: \
flower \
indev ens4f0 \
vlan_ethtype ipv4 \
vlan_id 100 \
vlan_prio 3 \
action vlan pop
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
The matchall classifier matches every packet and allows the user to apply
actions on it. In addition, it supports the skip_sw and skip_hw (as can
be found on u32 and flower filter) that direct the kernel to skip the
software/hardware processing of the actions.
This filter is very useful in usecases where every packet should be
matched. For example, packet mirroring (SPAN) can be setup very easily
using that filter.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Before this patch:
# ./tc/tc actions add action drop index 11
RTNETLINK answers: File exists
We have an error talking to the kernel
Command "(null)" is unknown, try "tc actions help".
After this patch:
# ./tc/tc actions add action drop index 11
RTNETLINK answers: File exists
We have an error talking to the kernel
Cc: Stephen Hemminger <shemming@brocade.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
The function returns zero on success.
Reported-by: Mark Bloch <markb@mellanox.com>
Fixes: 69f5aff63c ("tc: use action_a2n() everywhere")
Signed-off-by: Phil Sutter <phil@nwl.cc>
When switching to C99 initializers, I forgot to add this one. This means
that when trying to set an estimator value, tc would complain about
spurious duplicate estimator parameter. But much worse, the random
variable content is sent to the kernel regardless of whether an
estimator was given or not.
Fixes: d17b136f7d ("Use C99 style initializers everywhere")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
It's a pitty this function is used nowhere, so let's polish it for use:
* Loop over branch names, makes it clear that every former conditional
was exactly identical.
* Support 'pipe' branch name, too.
* Make number parsing optional.
Signed-off-by: Phil Sutter <phil@nwl.cc>
* Drop 'extern' keyword before function declarations.
* Add parameter names where they were missing for matters of
consistency.
* Drop fancy indenting (e.g. tab between type and name).
* Break long lines to not exceed 80 columns.
Signed-off-by: Phil Sutter <phil@nwl.cc>
The optional mask which may be added to int values is considered by the
kernel only if it is non-zero, therefore tc should only then also print
it.
Without this, not passing a mask value like so:
| # tc filter add dev d0 parent 8001: \
| basic match meta\(vlan eq 1\) \
| classid 8001:1
Would lead to tc printing an all-zero mask later:
| # tc filter show dev d0
| filter parent 8001: protocol all pref 49151 basic
| filter parent 8001: protocol all pref 49151 basic handle 0x1 flowid 8001:1
| meta(vlan mask 0x00000000 eq 1)
This is obviously confusing as an all-zero mask strictly means to
eliminate all bits from the value, but the opposite is the case.
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Since parse_rtattr_flags() calls memset already, there is no need for
callers to do so themselves.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
This only replaces occurrences where the newly allocated memory is
cleared completely afterwards, as in other cases it is a theoretical
performance hit although code would be cleaner this way.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
This big patch was compiled by vimgrepping for memset calls and changing
to C99 initializer if applicable. One notable exception is the
initialization of union bpf_attr in tc/tc_bpf.c: changing it would break
for older gcc versions (at least <=3.4.6).
Calls to memset for struct rtattr pointer fields for parse_rtattr*()
were just dropped since they are not needed.
The changes here allowed the compiler to discover some unused variables,
so get rid of them, too.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
This improves my initial change in the following points:
- Flatten embedded struct's initializers.
- No need to initialize variables to zero as the key feature of C99
initializers is to do this implicitly.
- By relocating the declaration of struct rtattr *tail, it can be
initialized at the same time.
Fixes: a0a73b298a ("tc: m_action: Use C99 style initializers for struct req")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Use the official BPF ELF e_machine value that was assigned recently [1]
and will be propagated to glibc, libelf et al. LLVM will switch to it
in 3.9 release, therefore we need to prepare tc to check for EM_ELF as
well, older version still have the EM_NONE.
[1] 36b9c09330
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
On devices that support TC flower offloads, these flags enable a filter to be
added only to HW or only to SW. skip_sw and skip_hw are mutually exclusive
flags. By default without any flags, the filter is added to both HW and SW,
but no error checks are done in case of failure to add to HW.
With skip-sw, failure to add to HW is treated as an error.
Here is a sample script that adds 2 filters, one with skip_sw and the other
with skip_hw flag.
# add ingress qdisc
tc qdisc add dev enp0s9 ingress
# enable hw tc offload.
ethtool -K enp0s9 hw-tc-offload on
# add a flower filter with skip-sw flag.
tc filter add dev enp0s9 protocol ip parent ffff: flower \
ip_proto 1 indev enp0s9 skip_sw \
action drop
# add a flower filter with skip-hw flag.
tc filter add dev enp0s9 protocol ip parent ffff: flower \
ip_proto 3 indev enp0s9 skip_hw \
action drop
Signed-off-by: Amir Vadai <amirva@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
I'll make a formal submission sans the header when the kernel patches
makes it in. This version is for someone who wants to play around with
the net-next kernel patches i sent
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Instead of initializing fields after (or sometimes even before) zeroing
the whole struct via memset(), initialize the whole thing at declaration
time.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Since commit 5cd1adb ("Update to current iptables headers") the build
with m_ipt.o and the following config will fail:
TC_CONFIG_XT:=n
TC_CONFIG_XT_OLD:=n
TC_CONFIG_XT_OLD_H:=n
This patch renames "iptables_target" to "xtables_target" and some other
things which gets renamed and I noticed while reading iptables git log.
Functions which are not used in m_ipt.c and not exported by the header
are removed, if they still used in m_ipt.c I added a static to the function.
Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
This pulls common code from parse_ipt() and print_ipt() functions
together.
While here, also fix for incorrect use of the global 'optarg' variable
in print_ipt().
Signed-off-by: Phil Sutter <phil@nwl.cc>