Commit Graph

870 Commits

Author SHA1 Message Date
Roman Mashak bf7d148803 tc: use get_u32() in psample action to match types
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:50 -07:00
Roman Mashak e9fa16583a tc: print actual action for sample action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-16 13:38:38 -07:00
Toke Høiland-Jørgensen 997f2dc193 tc: Add JSON output of fq_codel stats
Enable proper JSON output support for fq_codel in `tc -s qdisc` output.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:40 -07:00
Toke Høiland-Jørgensen d7d044ff53 tc: Add missing documentation for codel and fq_codel parameters
Add missing documentation of the memory_limit fq_codel parameter and the
ce_threshold codel and fq_codel parameters.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:05:35 -07:00
Pieter Jansen van Vuuren fb4e6abfca tc: f_flower: Add support for matching first frag packets
Add matching support for distinguishing between first and later fragmented
packets.

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags firstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
	ip_flags nofirstfrag \
        ip_proto udp \
    action mirred egress redirect dev eth1

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 18:03:21 -07:00
David Ahern e9625d6aea Merge branch 'iproute2-master' into iproute2-next
Conflicts:
	bridge/mdb.c

Updated bridge/bridge.c per removal of check_if_color_enabled by commit
1ca4341d2c ("color: disable color when json output is requested")

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-03-13 17:48:10 -07:00
Serhey Popovych fe99adbca4 utils: Introduce and use nodev() helper routine
There is a couple of places where we report error in case of no network
device is found. In all of them we output message in the same format to
stderr and either return -1 or 1 to the caller or exit with -1.

Introduce new helper function nodev() that takes name of the network
device caused error and returns -1 to it's caller. Either call exit()
or return to the caller to preserve behaviour before change.

Use -nodev() in traffic control (tc) code to return 1.

Simplify expression for checking for argument being 0/NULL in @if
statement.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
2018-03-11 17:58:36 -07:00
Davide Caratti 75ef7b18d2 tc: fix parsing of the control action
If the user didn't specify any control action, don't pop the command line
arguments: otherwise, parsing of the next argument (tipically the 'index'
keyword) results in an error, causing the following 'tc-testing' failures:

 Test a6d6: Add skbedit action with index
 Test 38f3: Delete skbedit action
 Test a568: Add action with ife type
 Test b983: Add action without ife type
 Test 7d50: Add skbmod action to set destination mac
 Test 9b29: Add skbmod action to set source mac
 Test e93a: Delete an skbmod action

Also, add missing parse for 'ok' control action to m_police, to fix the
following 'tc-testing' failure:

 Test 8dd5: Add police action with control ok

tested with:
 # ./tdc.py

test results:
 all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod
 action from a list" (which is failing also before this commit)

Fixes: 3572e01a09 ("tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()")
Cc: Michal Privoznik <mprivozn@redhat.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-03-04 09:01:38 -08:00
Eyal Birger dd29621578 tc: add em_ipt ematch for calling xtables matches from tc matching context
The commit calls a new tc ematch for using netfilter xtable matches.

This allows early classification as well as mirroning/redirecting traffic
based on logic implemented in netfilter extensions.

Current supported use case is classification based on the incoming IPSec
state used during decpsulation using the 'policy' iptables extension
(xt_policy).

The matcher uses libxtables for parsing the input parameters.

Example use for matching an IPSec state with reqid 1:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: \
    basic match 'ipt(-m policy --dir in --pol ipsec --reqid 1)' \
    action drop

This is the user-space counter part of kernel commit ccc007e4a746
("net: sched: add em_ipt ematch for calling xtables matches")

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:16 -08:00
Eyal Birger 526862038e tc: ematch: add parse_eopt_argv() method for providing ematches with argv parameters
ematche uses YACC to parse ematch arguments and places them in struct bstr
linked lists.

It is useful to be able to receive parameters as argc,argv in order to use
getopt (and alike) argument parsers.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-27 09:43:06 -08:00
Adam Vyskovsky 2fb854d07c tc: fix an off-by-one error while printing tc actions
The tc_print_action() function did not print all tc actions
when e.g. TCA_ACT_MAX_PRIO actions were defined for a single
tc filter.

Signed-off-by: Adam Vyskovsky <adamvyskovsky@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-02-23 08:18:29 -08:00
Stephen Hemminger 2d165c0811 tc: implement color output
Implement the -color option; in this case -co is ambiguous
since it was already used for -conf.
For now this just means putting device name in color.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-21 09:12:28 -08:00
Serhey Popovych 5433656705 ip: Use single variable to represent -pretty
After commit a233caa0aa ("json: make pretty printing optional") I get
following build failure:

    LINK     rtmon
    ../lib/libutil.a(json_print.o): In function `new_json_obj':
    json_print.c:(.text+0x35): undefined reference to `show_pretty'
    collect2: error: ld returned 1 exit status
    make[1]: *** [rtmon] Error 1
    make: *** [all] Error 2

It is caused by missing show_pretty variable in rtmon.

On the other hand tc/tc.c there are two distinct variables and single
matches() call that handles -pretty option thus setting show_pretty
will never happen. Note that since commit 44dcfe8201 ("Change
formatting of u32 back to default") show_pretty is used in tc/f_u32.c
so this is first place where -pretty introduced.

Furthermore other utilities like misc/ifstat.c and misc/nstat.c define
pretty variable, however only for their own purposes. They both support
JSON output and thus depend show_pretty in new_json_obj().

Assuming above use common variable to represent -pretty option, define
it in utils.c and declare in utils.h that is commonly used. Replace
show_pretty with pretty.

Fixes: a233caa0aa ("json: make pretty printing optional")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-16 08:13:36 -08:00
Stephen Hemminger a233caa0aa json: make pretty printing optional
Since JSON is intended for programmatic consumption, it makes
sense for the default output format to be concise as possible.

For programmer and other uses, it is helpful to keep the pretty
whitespace format; therefore enable it with -p flag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-10 08:15:08 -08:00
Serhey Popovych c14f9d92ee treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes
We have helper routines to support nested attribute addition into
netlink buffer: use them instead of open coding.

Use addattr_nest_compat()/addattr_nest_compat_end() where appropriate.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-02-02 15:01:09 -08:00
David Ahern 1e24e773f1 Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-29 08:24:57 -08:00
Jakub Kicinski 44c7655186 tc: fix second printing of requeues
Non-JSON tc qdisc output used to print the "requeues" statistic
twice.  Commit 4fcec7f366 ("tc: jsonify stats2") tried to preserve
this behaviour for both standard output and JSON, but used the wrong
statistic (q.qlen).  Also duplicating keys in JSON is not allowed,
so the second occurrence should be completely skipped with JSON.

Fixes: 4fcec7f366 ("tc: jsonify stats2")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-27 16:06:54 -08:00
Jakub Kicinski c061b75895 tc: prio: JSON-ify prio output
Make JSON output work with prio Qdiscs.  This will also make
other qdiscs which reuse the print_qopt work, like mqprio or
pfifo_fast.

Note that there is a double space between "priomap" and first
prio number.  Keep this original behaviour.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 13:00:18 -08:00
Jakub Kicinski 097415d510 tc: red: JSON-ify RED output
Make JSON output work with RED Qdiscs.  Float/double printing
helpers have to be added/uncommented to print the probability.
Since TC stats in general are not split out to a separate object
the xstats printed by this patch are not separated either.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-26 12:59:55 -08:00
David Ahern 6517b5c0ac Merge branch 'iproute2-master' into iproute2-next
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-24 09:59:03 -08:00
Wolfgang Bumiller 7ac29190db tc/lexer: let quotes actually start strings
The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-01-24 08:49:10 -08:00
Jiri Pirko 063463efd7 tc: implement ingress/egress block index attributes for qdiscs
During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:57 -08:00
Jiri Pirko 0c7cef9669 tc: introduce support for block-handle for filter operations
So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:53 -08:00
Jiri Pirko d0bcedd549 tc: introduce tc_qdisc_block_exists helper
This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 10:42:35 -08:00
David Ahern 8c75f69411 Merge branch 'master' into net-next
Conflicts:
	ip/link_gre.c
	ip/link_gre6.c

Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-21 09:37:39 -08:00
Jakub Kicinski e0850bdedc tc: red: allow setting th_min and th_max to the same value
Setting th_min and th_max to the same value may be useful for DCTCP
deployments.  The original DCTCP paper describes it as a simplest way
of achieving simple ECN threshold marking.  Indeed, there doesn't seem
to be any simpler qdisc in Linux which would allow such a setup today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-19 12:35:23 -08:00
Phil Sutter 6f7df6b2a1 tc: Optimize gact action lookup
When adding a filter with a gact action such as 'drop', tc first tries
to open a shared object with equivalent name (m_drop.so in this case)
before trying gact. Avoid this by matching the action name against those
handled by gact prior to calling get_action_kind().

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2018-01-17 10:27:47 -08:00
Chris Mi 485d0c6001 tc: Add batchsize feature for filter and actions
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this support, at most 128
commands can be accumulated before sending to kernel.

Now it only works for the following successive commands:
1. filter add/delete/change/replace
2. actions add/change/replace

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-01-14 09:03:35 -08:00
Stephen Hemminger 7d63671030 tc: remove no longer relevant README
This document described how kernel and tc used to handle
timing. In last two years, kernel has switched over to using
ktime. Nothing to see here, move along.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-01-10 08:21:22 -08:00
Jamal Hadi Salim 24a5a48e27 tc: Fix filter protocol output
Fixes: 249284ff5a ("tc: jsonify filter core")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2018-01-09 08:09:10 -08:00
Yuval Mintz b97c6fa71d qdisc: print offload indication
Use the newly added TCA_HW_OFFLOAD indication from kernel
to print a consistent 'offloaded' message to user when listing qdiscs.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-27 13:55:16 -08:00
Chris Mi 83cf5bc73b tc: fix command "tc actions del" hang issue
If command is RTM_DELACTION, a non-NULL pointer is passed to rtnl_talk().
Then flag NLM_F_ACK is not set on n->nlmsg_flags and netlink_ack() will
not be called. Command tc will wait for the reply for ever.

Fixes: 86bf43c7c2 ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-12-14 21:17:04 -08:00
Jiri Pirko 1876ab0779 tc: fix json array closing
Fixes: 2704bd6255 ("tc: jsonify actions core")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-12-13 18:16:27 -08:00
Michal Privoznik 3572e01a09 tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()
Not all callers want parse_action_control*() to advance the
arguments. For instance act_parse_police() does the argument
advancing itself.

Fixes: e67aba5595 ("tc: actions: add helpers to parse and print control actions")
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2017-12-08 10:29:01 -08:00
Stephen Hemminger c6a656f4f9 m_mirred: style cleanups
Fix whitespace and long lines.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:42:17 -08:00
Stephen Hemminger 5c235ac27e m_gact: whitespace cleanup
Fix whitespace errors reported by checkpatch

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:38:21 -08:00
Stephen Hemminger ed4856919f m_action: style cleanup
Break long lines, and use bool where possible.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:36:15 -08:00
Stephen Hemminger eb4bccf12b m_vlan: style cleanups
Break long lines and make duplicated code into function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-26 12:28:55 -08:00
Jiri Pirko b021ee40f6 tc: jsonify vlan action
Add json output to vlan action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 502c4adf19 tc: jsonify mirred action
Add json output to mirred action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 66fedb6df0 tc: jsonify gact action
Add json output to gact action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 2704bd6255 tc: jsonify actions core
Add json output to actions core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 619ca351e3 tc: jsonify matchall filter
Add json output to matchall filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko e28b88a464 tc: jsonify flower filter
Add json output to flower filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 249284ff5a tc: jsonify filter core
Add json output to filter core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko f354fa6aa5 tc: jsonify htb qdisc
Add json output to htb qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 378ac491f5 tc: jsonify fq_codel qdisc
Add json output to fq_codel qdisc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 4fcec7f366 tc: jsonify stats2
Add json output to stats2.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko c91d262f41 tc: jsonify qdisc core
Add json output to qdisc core.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:20:51 -08:00
Jiri Pirko 81051c60c2 tc: remove action cookie len from printout
Make the output same as input and avoid printout of unnecessary len.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jiri Pirko abff45b802 tc: move action cookie print out of the stats if
Cookie print was made dependent on show_stats for no good reason. Fix
this bu pushing cookie print ot of the stats if.

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2017-11-26 12:18:38 -08:00
Jakub Kicinski eb91c55731 f_bpf: communicate ifindex for eBPF offload
Split parsing and loading of the eBPF program and if skip_sw is set
load the program for ifindex, to which the qdisc is attached.

Note that the ifindex will be ignored for programs which are already
loaded (e.g. when using pinned programs), but in that case we just
trust the user knows what he's doing.  Hopefully we will get extack
soon in the driver to help debugging this case.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 01ea76b1cf tc_filter: resolve device name before parsing filter
Move resolving device name into an ifindex before calling filter
specific callbacks.  This way if filters need the ifindex, they
can read it from the request.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 67c857df80 {f, m}_bpf: don't allow specifying multiple bpf programs
Both BPF filter and action will allow users to specify run
multiple times, and only the last one will be considered by
the kernel.  Explicitly refuse such command lines.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 399db8392b bpf: rename bpf_parse_common() to bpf_parse_and_load_common()
bpf_parse_common() parses and loads the program.  Rename it
accordingly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Jakub Kicinski 658cfebc27 bpf: pass program type in struct bpf_cfg_in
Program type is needed both for parsing and loading of
the program.  Parsing may also induce the type based on
signatures from __bpf_prog_meta.  Instead of passing
the type around keep it in struct bpf_cfg_in.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-26 11:57:57 -08:00
Stephen Hemminger 6054c1ebf7 SPDX license identifiers
For all files in iproute2 which do not have an obvious license
identification, mark them with SPDK GPL-2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 12:21:35 -08:00
Stephen Hemminger 859af0a5dc tc: break long lines
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:31:36 -08:00
Nishanth Devarajan 927e3cfb52 tc: B.W limits can now be specified in %.
This patch adapts the tc command line interface to allow bandwidth limits
to be specified as a percentage of the interface's capacity.

Adding this functionality requires passing the specified device string to
each class/qdisc which changes the prototype for a couple of functions: the
.parse_qopt and .parse_copt interfaces. The device string is a required
parameter for tc-qdisc and tc-class, and when not specified, the kernel
returns ENODEV. In this patch, if the user tries to specify a bandwidth
percentage without naming the device, we return an error from userspace.

Signed-off-by: Nishanth Devarajan<ndev2021@gmail.com>
2017-11-24 11:22:13 -08:00
Stephen Hemminger b317557f58 tc: replace magic constant 16 with #define
For places where tc is expecting device name use IFNAMSIZ.
For others where it is a filter name, introduce a new constant.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-24 11:19:18 -08:00
Phil Sutter 66942e522e tc_util: Silence spurious compiler warning
GCC version 7.2.1 complains that 'result1' may be used uninitialized in
parse_action_control_slash_spaces(). This should not be possible in
practice, so the actual value 'result1' is initialized with does not
matter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Phil Sutter b7c61286de tc_util: Drop needless pointer check
The function parse_action_control_slash() returns early if 'p' is NULL,
so after the first call to action_a2n(), 'p' is guaranteed not to be
NULL. Otherwise, the assignment '*p = 0' above would dereference the
NULL pointer already anyway, so just drop this check here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-11-16 16:01:48 -08:00
Stephen Hemminger 913352fe54 drop unneeded include of syslog.h
Only arpd uses syslog

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-12 16:22:36 -08:00
Stephen Hemminger d72ac5a17b Merge branch 'master' into net-next 2017-11-12 16:17:37 -08:00
Ivan Vecera 6648853975 lib: make resolve_hosts variable common
Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2017-11-12 16:15:23 -08:00
Roman Mashak 274b63ae21 tc: distinguish Add/Replace qdisc operations
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
2017-11-12 15:57:08 -08:00
Stephen Hemminger b158c1790f Merge branch 'master' into net-next 2017-11-09 09:45:17 +09:00
Stephen Hemminger e4beb52787 netem: use fixed rather than floating point for scaling
Don't need to do floating point math to compute scaled random.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-11-07 11:15:34 +09:00
Amritha Nambiar 0d575c4dac flower: Represent HW traffic classes as classid values
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the classid reservation scheme for hardware
traffic classes and offloads to route packets to a hardware
traffic class are accepted in net-next.

HW traffic classes 0 through 15 are represented using the
reserved classid values :ffe0 - :ffef.

Example:
Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1/32\
  ip_proto udp dst_port 12000 skip_sw\
  hw_tc 1

# tc filter show dev eth0 parent ffff:
filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 hw_tc 1
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.1.1
  dst_port 12000
  skip_sw
  in_hw

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-07 11:04:54 +09:00
Vinicius Costa Gomes c9681ac1b3 tc: Add support for the CBS qdisc
The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).

The syntax is:

tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
   		hicredit <HICREDIT> sendslope <SENDSLOPE>
		idleslope <IDLESLOPE>

(The order is not important)

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-01 22:22:48 +01:00
Amritha Nambiar e1ac5b06f2 tc/mqprio: Offload mode and shaper options in mqprio
This patch was previously submitted as RFC. Submitting this as
non-RFC now that the tc/mqprio changes are accepted in net-next.

Adds new mqprio options for 'mode' and 'shaper'. The mode
option can take values for offload modes such as 'dcb' (default),
'channel' with the 'hw' option set to 1. The new 'channel' mode
supports offloading TCs and other queue configurations. The
'shaper' option is to support HW shapers ('dcb' default) and
takes the value 'bw_rlimit' for bandwidth rate limiting. The
parameters to the bw_rlimit shaper are minimum and maximum
bandwidth rates. New HW shapers in future can be supported
through the shaper attribute.

# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
  min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             mode:channel
             shaper:bw_rlimit   min_rate:1Gbit 2Gbit   max_rate:4Gbit 5Gbit

v2: Avoid buffer overrun and minor cleanup.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
2017-11-01 22:20:06 +01:00
Stephen Hemminger c1606c44b3 Merge branch 'master' into net-next 2017-10-31 18:03:12 +01:00
Alexander Aring 25a24934ab tc: m_ife: fix match tcindex parsing
This patch changes ife_prio to ife_tcindex which is right variable to
assign in the argument in this case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-10-31 17:56:58 +01:00
Hangbin Liu 86bf43c7c2 lib/libnetlink: update rtnl_talk to support malloc buff at run time
This is an update for 460c03f3f3 ("iplink: double the buffer size also in
iplink_get()"). After update, we will not need to double the buffer size
every time when VFs number increased.

With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
length parameter.

With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
answer to avoid overwrite data in nlh, because it may has more info after
nlh. also this will avoid nlh buffer not enough issue.

We need to free answer after using.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-26 12:29:29 +02:00
Stephen Hemminger 2ac0c6c2c1 Merge branch 'master' into net-next 2017-10-25 12:39:18 +02:00
Jamal Hadi Salim 35f2a7639d tc/actions: introduce support for jump action
Sample use case:

... add ingress qdisc
sudo $TC qdisc add dev $ETH ingress

 ... if we exceed rate of 1kbps (burst of 90K), do an absolute jump of 2 actions
sudo $TC actions add action police rate 1kbit burst 90k conform-exceed jump 2 / pipe

sudo $TC -s actions ls action police
 action order 0:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
 ref 1 bind 0 installed 41 sec used 41 sec
 Action statistics:
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

... lets add a couple of marks so we can use them to mark exceed/not exceed
sudo $TC actions add action skbedit mark 11 ok index 11
sudo $TC actions add action skbedit mark 12 ok index 12

... if we dont exceed our rate we get a mark of 11, else mark of 12
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 4 \
action skbedit index 11 \
action skbedit index 12

Ok, lets keep this thing a little busy..
sudo ping -f -c 10000 127.0.0.8

... now lets see the filters..
sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32 chain 0
filter pref 8 u32 chain 0 fh 800: ht divisor 1
filter pref 8 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 not_in_hw  (rule hit 20000 success 10000)
  match 7f000008/ffffffff at 16 (success 10000 )
	action order 1:  police 0x4 rate 1Kbit burst 23440b mtu 2Kb action jump 2/pipe overhead 0b
	ref 2 bind 1 installed 198 sec used 2 sec
	Action statistics:
	Sent 840000 bytes 10000 pkt (dropped 0, overlimits 9721 requeues 0)
	backlog 0b 0p requeues 0

	action order 2:  skbedit mark 11 pass
	 index 11 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 23436 bytes 279 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 3:  skbedit mark 12 pass
	 index 12 ref 2 bind 1 installed 127 sec used 2 sec
 	Action statistics:
	Sent 816564 bytes 9721 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

As can be seen 97.21% of the packets were marked as exceeding the allocated
rate; you could do something clever with the skb mark after this.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-25 12:33:46 +02:00
Stephen Hemminger 4c6080b5c4 Merge branch 'master' into net-next 2017-10-12 09:06:10 -07:00
Stephen Hemminger 268a9eee98 netem: fix code indentation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-10-11 18:08:15 -07:00
Stephen Hemminger 60509b997d Merge branch 'master' into net-next 2017-10-02 08:04:13 -07:00
Phil Sutter 625df645b7 Check user supplied interface name lengths
The original problem was that something like:

| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);

might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.

Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Phil Sutter ee474849c8 tc: flower: No need to cache indev arg
Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-10-02 08:01:21 -07:00
Yulia Kartseva 73451259da tc: fix ipv6 filter selector attribute for some prefix lengths
Wrong TCA_U32_SEL attribute packing if prefixLen AND 0x1f equals 0x1f.
These are  /31, /63, /95 and /127 prefix lengths.

Example:
ip6 dst face:b00f::/31
filter parent b: protocol ipv6 pref 2307 u32
filter parent b: protocol ipv6 pref 2307 u32 fh 800: ht divisor 1
filter parent b: protocol ipv6 pref 2307 u32 fh 800::800 order 2048
key ht 800 bkt 0
  match faceb00f/ffffffff at 24

v2: previous patch was made with a wrong repo

Signed-off-by: Yulia Kartseva <hex@fb.com>
2017-10-01 13:41:29 -07:00
Stephen Hemminger 58677cc2d3 tc: flower remove unused variable
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-09-20 18:08:43 -07:00
Benjamin LaHaise 7638ee13c1 tc: flower: support for matching MPLS labels
This patch adds support to the iproute2 tc filter command for matching MPLS
labels in the flower classifier.  The ability to match the Time To Live,
Bottom Of Stack, Traffic Control and Label fields are added as options to
the flower filter.

e.g.:
  tc filter add dev eth0 protocol 0x8847 parent ffff: \
    flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
    action drop

Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2017-09-20 18:07:21 -07:00
Eric Dumazet ff28b7519d tc: fq: support low_rate_threshold attribute
TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9

Tested:

lpaa5:/tmp# tc -qd add dev eth1 root fq
lpaa5:/tmp# tc -s qd sh dev eth1
qdisc fq 8003: root refcnt 5 limit 10000p flow_limit 1000p buckets 4096 \
 orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3648 \
 initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
 Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  116 flows (114 inactive, 0 throttled)
  1 gc, 0 highprio, 0 throttled

lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
7081

lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
99th Percentile Latency Microseconds
858

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
2017-09-12 21:33:31 -07:00
Stephen Hemminger a17a01145f Merge branch 'master' into net-next 2017-09-05 09:33:29 -07:00
Daniel Borkmann a0b5b7cf5c bpf: consolidate dumps to use bpf_dump_prog_info
Consolidate dump of prog info to use bpf_dump_prog_info() when possible.
Moving forward, we want to have a consistent output for BPF progs when
being dumped. E.g. in cls/act case we used to dump tag as a separate
netlink attribute before we had BPF_OBJ_GET_INFO_BY_FD bpf(2) command.

Move dumping tag into bpf_dump_prog_info() as well, and only dump the
netlink attribute for older kernels. Also, reuse bpf_dump_prog_info()
for XDP case, so we can dump tag and whether program was jited, which
we currently don't show.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-05 09:26:34 -07:00
Simon Horman b75e0f6f4b tc actions: store and dump correct length of user cookies
Correct two errors which cancel each other out:
* Do not send twice the length of the actual provided by the user to the kernel
* Do not dump half the length of the cookie provided by the kernel

As the cookie is now stored in the kernel at its correct length rather
than double the that length cookies of up to the maximum size of 16 bytes
may now be stored rather than a maximum of half that length.

Output of dump is the same before and after this change,
but the data stored in the kernel is now exactly the cookie
rather than the cookie + as many trailing zeros.

Before:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 RTNETLINK answers: Invalid argument

After:
 # tc filter add dev eth0 protocol ip parent ffff: \
       flower ip_proto udp action drop \
       cookie 0123456789abcdef0123456789abcdef
 # tc filter show dev eth0 ingress
   eth_type ipv4
   ip_proto udp
   not_in_hw
	 action order 1: gact action drop
	  random type none pass val 0
	  index 1 ref 1 bind 1 installed 1 sec used 1 sec
	 Action statistics:
	 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	 backlog 0b 0p requeues 0
	 cookie len 16 0123456789abcdef0123456789abcdef

Fixes: fd8b3d2c1b ("actions: Add support for user cookies")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2017-09-05 09:25:46 -07:00
Stephen Hemminger 2e706e12d9 Merge branch 'master' into net-next
Needed to add JSON support to tclass.
2017-09-01 12:17:48 -07:00
Phil Sutter 9376314b49 tc_util: No need to terminate an snprintf'ed buffer
snprintf() won't leave the buffer unterminated, so manually terminating
is not necessary here.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Phil Sutter 18f156bfec Convert the obvious cases to strlcpy()
This converts the typical idiom of manually terminating the buffer after
a call to strncpy().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-09-01 12:10:54 -07:00
Alexander Aring 38060de1eb tc: m_ife: report about kernels default type
This patch will report about if the ethertype for IFE is not specified
that the default IFE type is used.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring 664f35aa7c tc: m_ife: print IEEE ethertype format
This patch uses the usually IEEE format to display an ethertype which is
4-digits and every digit in upper case.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
2017-08-30 08:26:46 -07:00
Alexander Aring bf338b60d4 tc: m_ife: allow ife type to zero
This patch allows to set an ethertype for IFE which is zero. There is no
kernel side validation which forbids a type to zero.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2017-08-30 08:26:46 -07:00
Stephen Hemminger f474588028 Merge branch 'master' into net-next 2017-08-24 15:30:32 -07:00
Stephen Hemminger c4fc474b88 tc: use named initializer for default mqprio options
Use C99 initializer

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-24 15:28:15 -07:00
Phil Sutter 56270e5466 tc/m_xt: Fix for potential string buffer overflows
- Use strncpy() when writing to target->t->u.user.name and make sure the
  final byte remains untouched (xtables_calloc() set it to zero).
- 'tname' length sanitization was completely wrong: If it's length
  exceeded the 16 bytes available in 'k', passing a length value of 16
  to strncpy() would overwrite the previously NULL'ed 'k[15]'. Also, the
  sanitization has to happen if 'tname' is exactly 16 bytes long as
  well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:53:14 -07:00
Phil Sutter 75716932a0 tc/tc_filter: Make sure filter name is not empty
The later check for 'k[0] != 0' requires a non-empty filter name,
otherwise NULL pointer dereference in 'q' might happen.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Phil Sutter a754de3ccd tc/q_netem: Don't dereference possibly NULL pointer
Assuming 'opt' might be NULL, move the call to RTA_PAYLOAD to after the
check since it dereferences its parameter.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2017-08-24 14:49:44 -07:00
Stephen Hemminger 5f1df307b4 config: put CFLAGS/LDLIBS in config.mk
This renames Config to config.mk and includes more Make input.
Now configure generates all the required CFLAGS and LDLIBS for
the optional libraries.

Also, use pkg-config to test for libelf, rather than using a test
program. This makes it consistent with other libraries.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-08-23 10:03:09 -07:00