Use the newly added TCA_HW_OFFLOAD indication from kernel
to print a consistent 'offloaded' message to user when listing qdiscs.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If command is RTM_DELACTION, a non-NULL pointer is passed to rtnl_talk().
Then flag NLM_F_ACK is not set on n->nlmsg_flags and netlink_ack() will
not be called. Command tc will wait for the reply for ever.
Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Not all callers want parse_action_control*() to advance the
arguments. For instance act_parse_police() does the argument
advancing itself.
Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Make the output same as input and avoid printout of unnecessary len.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Cookie print was made dependent on show_stats for no good reason. Fix
this bu pushing cookie print ot of the stats if.
Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Split parsing and loading of the eBPF program and if skip_sw is set
load the program for ifindex, to which the qdisc is attached.
Note that the ifindex will be ignored for programs which are already
loaded (e.g. when using pinned programs), but in that case we just
trust the user knows what he's doing. Hopefully we will get extack
soon in the driver to help debugging this case.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Move resolving device name into an ifindex before calling filter
specific callbacks. This way if filters need the ifindex, they
can read it from the request.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Both BPF filter and action will allow users to specify run
multiple times, and only the last one will be considered by
the kernel. Explicitly refuse such command lines.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
bpf_parse_common() parses and loads the program. Rename it
accordingly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Program type is needed both for parsing and loading of
the program. Parsing may also induce the type based on
signatures from __bpf_prog_meta. Instead of passing
the type around keep it in struct bpf_cfg_in.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.
Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.
Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
For places where tc is expecting device name use IFNAMSIZ.
For others where it is a filter name, introduce a new constant.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
GCC version 7.2.1 complains that 'result1' may be used uninitialized in
parse_action_control_slash_spaces(). This should not be possible in
practice, so the actual value 'result1' is initialized with does not
matter.
Signed-off-by: Phil Sutter <phil@nwl.cc>
The function parse_action_control_slash() returns early if 'p' is NULL,
so after the first call to action_a2n(), 'p' is guaranteed not to be
NULL. Otherwise, the assignment '*p = 0' above would dereference the
NULL pointer already anyway, so just drop this check here.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the classid reservation scheme for hardware
traffic classes and offloads to route packets to a hardware
traffic class are accepted in net-next.
HW traffic classes 0 through 15 are represented using the
reserved classid values :ffe0 - :ffef.
Example:
Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent ffff:\
prio 1 flower dst_ip 192.168.1.1/32\
ip_proto udp dst_port 12000 skip_sw\
hw_tc 1
# tc filter show dev eth0 parent ffff:
filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 hw_tc 1
eth_type ipv4
ip_proto udp
dst_ip 192.168.1.1
dst_port 12000
skip_sw
in_hw
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).
The syntax is:
tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
hicredit <HICREDIT> sendslope <SENDSLOPE>
idleslope <IDLESLOPE>
(The order is not important)
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the tc/mqprio changes are accepted in net-next.
Adds new mqprio options for 'mode' and 'shaper'. The mode
option can take values for offload modes such as 'dcb' (default),
'channel' with the 'hw' option set to 1. The new 'channel' mode
supports offloading TCs and other queue configurations. The
'shaper' option is to support HW shapers ('dcb' default) and
takes the value 'bw_rlimit' for bandwidth rate limiting. The
parameters to the bw_rlimit shaper are minimum and maximum
bandwidth rates. New HW shapers in future can be supported
through the shaper attribute.
# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit
# tc qdisc show dev eth0
qdisc mqprio 804a: root tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
queues:(0:3) (4:7)
mode:channel
shaper:bw_rlimit min_rate:1Gbit 2Gbit max_rate:4Gbit 5Gbit
v2: Avoid buffer overrun and minor cleanup.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
This patch changes ife_prio to ife_tcindex which is right variable to
assign in the argument in this case.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.
With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.
With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.
We need to free answer after using.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Sample use case:
... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress
... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe
sudo $TC -s actions ls action police
action order 0: police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 1 bind 0 installed 41 sec used 41 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12
... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12
Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8
... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw (rule hit 20000 success 10000)
match 7f000008/ffffffff at 16 (success 10000 )
action order 1: police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
ref 2 bind 1 installed 198 sec used 2 sec
Action statistics:
Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
backlog 0b 0p requeues 0
action order 2: skbedit mark 11 pass
index 11 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
action order 3: skbedit mark 12 pass
index 12 ref 2 bind 1 installed 127 sec used 2 sec
Action statistics:
Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The original problem was that something like:
| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);
might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.
Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.
Signed-off-by: Phil Sutter <phil@nwl.cc>