Buffer is 64bytes, but label printing can take 66bytes printing
in hex, and will overflow when setting the string delimiter ('\0').
Fix that by increasing the print buffer size.
Example of overflowing ct_label:
ct_label 11111111111111111111111111111111/11111111111111111111111111111111
Fixes: 2fffb1c030 ("tc: flower: Add matching on conntrack info")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
0UL has type 'unsigned long' which is likely to be 64bit on modern machines. At
the same time, the '{idle,unbalanced}_timer' variables are declared as u32, so
these variables cannot be greater than '~0UL / 100' when 'unsigned long' is 64
bits. In such condition it is still possible to pass the check but get the
overflow later when the timers are multiplied by 100 in 'addattr32'.
Fix the possible overflow by changing '~0UL' to 'UINT32_MAX'.
Fixes: 9167671822 ("nexthop: Add support for resilient nexthop groups")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The 'name' field of the 'struct bpf_prog_info' is a plain C array. Thus, the
logical condition in bpf_dump_prog_info() is useless as the array address is
always true, so just remove it.
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Running lnstat will cause core dump from reading past end of array.
Segmentation fault (core dumped)
The maximum value of th.num_lines is HDR_LINES(10), h should not be equal to th.num_lines, array th.hdr may be out of bounds.
Signed-off-by jiangheng <jiangheng12@huawei.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix the wild bracket in the if clause leading to the error in the condition.
Fixes: d61167dd88 ("m_vlan: add pop_eth and push_eth actions")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
iproute ignores '-j' command line argument when dumping endpoints by id:
[dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show
[{"address":"1.2.3.4","id":42,"signal":true,"backup":true}]
[dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show id 42
1.2.3.4 id 42 signal backup
fix mptcp_addr_show() to use the proper JSON helpers.
Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit 690b11f4a6 ("tc: u32: Fix firstfrag filter.") applied in 2012
changed the "ip firstfrag" selector to not match non-fragmented packets
anymore.
However, the documentation added in f15a23966f ("tc: add a man page
for u32 filter") in 2015 includes an example that relies on the previous
behavior (non-fragmented packet counted as first fragment).
Due to this, the example does not work correctly and does not actually
classify regular SSH packets.
Modify the example to use a raw u16 selector on the fragment offset to
make it work, and also make the firstfrag description more clear about
the current behavior.
Fixes: f15a23966f ("tc: add a man page for u32 filter")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vincent Mailhol says:
====================
The main purpose is to add commandline support for Transmitter Delay
Compensation (TDC) in iproute. Other issues found during the
development of this feature also get addressed.
This patch series contains four patches which respectively:
1. Correct the bittiming ranges in the print_usage function and add
the units to give more clarity: some parameters are in milliseconds,
some in nano seconds, some in time quantum and the newly TDC
parameters introduced in this series would be in clock period.
2. Do some code refactoring on function print_ctrlmode().
3. factorize the many print_*(PRINT_JSON, ...) and fprintf
occurrences in a single print_*(PRINT_ANY, ...) call and fix the
signedness while doing that.
4. report the value of the bitrate prescalers (brp and dbrp).
5. adds command line support for the TDC in iproute and goes together
with below series in the kernel:
https://lore.kernel.org/linux-can/20210814091750.73931-1-mailhol.vincent@wanadoo.fr/T/#t
** Changelog **
>From RFC v5 to v6:
* Dropped the RFC tag because the related patch series on the kernel
side were pulled into net-next.
* Remove the changes in include/uapi/linux/can/netlink.h because
these should be pulled separately.
* Add another patch (the second of this series) to do some cleanup
on function print_ctrlmode().
* Minor fixes in the patch comments (grammar, rephrasing).
>From RFC v4 to RFC v5:
* Add the unit (bps, tq, ns or ms) in print_usage()
* Rewrote void can_print_timing_min_max() to better factorize the
code.
* Rewrote the commit message of the two last patches (those related
to TDC) to either add clarification of fix inacurracies.
>From v3 to RFC v4:
* Reflect the changes made on the kernel side.
>From RFC v2 to v3:
* Dropped the RFC tag. Now that the kernel patch reach the testing
branch, I am finaly ready.
* Regression fix: configuring a link with only nominal bittiming
returned -EOPNOTSUPP
* Added two more patches to the series:
- iplink_can: fix configuration ranges in print_usage()
- iplink_can: print brp and dbrp bittiming variables
* Other small fixes on formatting.
>From RFC v1 to RFC v2:
* Add an additional patch to the series to fix the issues reported
by Stephen Hemminger
Ref: https://lore.kernel.org/linux-can/20210506112007.1666738-1-mailhol.vincent@wanadoo.fr/T/#t
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.
This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).
This patch brings command line support to nine TDC parameters which
were recently added to the kernel's CAN netlink interface in order to
implement TDC:
- IFLA_CAN_TDC_TDCV_MIN: Transmitter Delay Compensation Value
minimum value
- IFLA_CAN_TDC_TDCV_MAX: Transmitter Delay Compensation Value
maximum value
- IFLA_CAN_TDC_TDCO_MIN: Transmitter Delay Compensation Offset
minimum value
- IFLA_CAN_TDC_TDCO_MAX: Transmitter Delay Compensation Offset
maximum value
- IFLA_CAN_TDC_TDCF_MIN: Transmitter Delay Compensation Filter
window minimum value
- IFLA_CAN_TDC_TDCF_MAX: Transmitter Delay Compensation Filter
window maximum value
- IFLA_CAN_TDC_TDCV: Transmitter Delay Compensation Value
- IFLA_CAN_TDC_TDCO: Transmitter Delay Compensation Offset
- IFLA_CAN_TDC_TDCF: Transmitter Delay Compensation Filter window
All those new parameters are nested together into the attribute
IFLA_CAN_TDC.
The TDC parameters extend the FD parameters. As such, the TDC
parameters must be specified together the "fd on" flag.
When "fd on" flag is provided, a tdc-mode parameter allows to specify
how to operate. Valid options for tdc-mode are:
* auto: the transmitter dynamically measures TDCV for each of the
transmitted frames. As such, TDCV can not be manually provided. In
this mode, the user must specify TDCO and may also specify TDCF if
supported.
* manual: use a static TDCV provided by the user. In this mode, the
user must specify both TDCV and TDCO and may also specify TDCF if
supported.
* off: TDC is explicitly disabled.
* tdc-mode parameter omitted (default mode): the kernel decides
whether TDC should be enabled or not and if so, it calculates the
TDC values. TDC parameters are an expert option and the average
user is not expected to provide those, thus the presence of this
"default mode".
If the fd flag is omitted, all the FD values (including TDC values)
remain unchanged.
If "fd off" flag is specified, all FD values (including TDC values)
are zeroed.
TDCV is always reported in manual mode. In auto mode, TDCV is reported
only if the value is available. Especially, the TDCV might not be
available if the controller has no feature to report it or if the
value in not yet available (i.e. no data sent yet and measurement did
not occur).
TDCF is reported only if tdcf_max is not zero (i.e. if supported by
the controller).
For reference, here are a few samples of how the output looks like:
| $ ip link set can0 type can bitrate 1000000 dbitrate 8000000 fd on tdco 7 tdcf 8 tdc-mode auto
| $ ip --details link show can0
| 1: can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
| link/can promiscuity 0 minmtu 0 maxmtu 0
| can <FD,TDC-AUTO> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
| bitrate 1000000 sample-point 0.750
| tq 12 prop-seg 29 phase-seg1 30 phase-seg2 20 sjw 1 brp 1
| ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp_inc 1
| dbitrate 8000000 dsample-point 0.700
| dtq 12 dprop-seg 3 dphase-seg1 3 dphase-seg2 3 dsjw 1 dbrp 1
| tdco 7 tdcf 8
| ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp_inc 1
| tdco 0..127 tdcf 0..127
| clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
| $ ip --details --json --pretty link show can0
| [ {
| "ifindex": 1,
| "ifname": "can0",
| "flags": [ "NOARP","ECHO" ],
| "mtu": 72,
| "qdisc": "noop",
| "operstate": "DOWN",
| "linkmode": "DEFAULT",
| "group": "default",
| "txqlen": 10,
| "link_type": "can",
| "promiscuity": 0,
| "min_mtu": 0,
| "max_mtu": 0,
| "linkinfo": {
| "info_kind": "can",
| "info_data": {
| "ctrlmode": [ "FD","TDC-AUTO" ],
| "state": "STOPPED",
| "berr_counter": {
| "tx": 0,
| "rx": 0
| },
| "restart_ms": 0,
| "bittiming": {
| "bitrate": 1000000,
| "sample_point": "0.750",
| "tq": 12,
| "prop_seg": 29,
| "phase_seg1": 30,
| "phase_seg2": 20,
| "sjw": 1,
| "brp": 1
| },
| "bittiming_const": {
| "name": "ES582.1/ES584.1",
| "tseg1": {
| "min": 2,
| "max": 256
| },
| "tseg2": {
| "min": 2,
| "max": 128
| },
| "sjw": {
| "min": 1,
| "max": 128
| },
| "brp": {
| "min": 1,
| "max": 512
| },
| "brp_inc": 1
| },
| "data_bittiming": {
| "bitrate": 8000000,
| "sample_point": "0.700",
| "tq": 12,
| "prop_seg": 3,
| "phase_seg1": 3,
| "phase_seg2": 3,
| "sjw": 1,
| "brp": 1,
| "tdc": {
| "tdco": 7,
| "tdcf": 8
| }
| },
| "data_bittiming_const": {
| "name": "ES582.1/ES584.1",
| "tseg1": {
| "min": 2,
| "max": 32
| },
| "tseg2": {
| "min": 1,
| "max": 16
| },
| "sjw": {
| "min": 1,
| "max": 8
| },
| "brp": {
| "min": 1,
| "max": 32
| },
| "brp_inc": 1,
| "tdc": {
| "tdco": {
| "min": 0,
| "max": 127
| },
| "tdcf": {
| "min": 0,
| "max": 127
| }
| }
| },
| "clock": 80000000
| }
| },
| "num_tx_queues": 1,
| "num_rx_queues": 1,
| "gso_max_size": 65536,
| "gso_max_segs": 65535
| } ]
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
Report the value of the bit-rate prescaler (brp) for both the nominal
and the data bittiming.
Currently, only the constant brp values (brp_{min,max,inc}) are being
reported. Also, brp is the only member of struct can_bittiming not
being reported.
Noticeably, brp could be calculated by hand from the other bittiming
parameters with below formula:
brp = clock * tq / 1000000000
with clock in hertz and tq in nano second (thus the need of a 1
billion factor to convert it back to second).
But because above formula is not so trivial to remember and is
subjected to rounding errors, it makes sense to directly output
{d,}bpr.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
Current implementation heavily relies on some "if (is_json_context())"
switches to decide the context and then does some print_*(PRINT_JSON,
...) when in json context and some fprintf(...) else.
Furthermore, current implementation uses either print_int() or the
conversion specifier %d to print unsigned integers.
This patch factorizes each pairs of print_*(PRINT_JSON, ...) and
fprintf() into a single print_*(PRINT_ANY, ...) call. While doing this
replacement, it uses proper unsigned function print_uint() as well as
the conversion specifier %u when the parameter is an unsigned integer.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch only does cleanup and do not introduce any functional
changes.
We do some code refactoring of print_ctrlmode() in prevision of the
upcoming patch:
- remove the first argument of print_ctrlmode(). It is a pointer to
FILE and is never used.
- add a new function argument: enum output_type t in order to
specify the output type (i.e. PRINT_{FP,JSON,ANY}).
- add a new function argument: const char *key in order to specify
the name of the json array (e.g. "ctrlmode").
- replace the _PF() macro with the print_flag() function to increase
readability.
- directly return if none of the flags are set (previously, this
check was done before calling the function).
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
The configuration ranges in print_usage() are taken from "Table 8 -
Time segments' minimum configuration ranges" in section 11.3.1.2
"Configuration of the bit time parameters" of ISO 11898-1.
The standard clearly specifies that "implementations may allow time
segments that exceed the minimum required configuration ranges
specified in Table 8".
Because no maximum ranges are given in the standard, all given ranges
{ a..b } are simply replaced with { NUMBER }.
The actual ranges are specific to each device and can be confirmed
doing:
$ ip --details link show can0
1: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can state STOPPED restart-ms 0
ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Finally, the unit (bps, tq, ns or ms) are given. The rationale to add
the units is that the TDC parameters (that will be introduced in the
upcoming patches) are measured in a different unit than the other
bittiming parameters: clock period (a.k.a. minimum time quantum)
instead of time quantum. Adding the units disambiguates things.
For reference, before the change:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]
[ loopback { on | off } ]
[ listen-only { on | off } ]
[ triple-sampling { on | off } ]
[ one-shot { on | off } ]
[ berr-reporting { on | off } ]
[ fd { on | off } ]
[ fd-non-iso { on | off } ]
[ presume-ack { on | off } ]
[ restart-ms TIME-MS ]
[ restart ]
[ termination { 0..65535 } ]
Where: BITRATE := { 1..1000000 }
SAMPLE-POINT := { 0.000..0.999 }
TQ := { NUMBER }
PROP-SEG := { 1..8 }
PHASE-SEG1 := { 1..8 }
PHASE-SEG2 := { 1..8 }
SJW := { 1..4 }
RESTART-MS := { 0 | NUMBER }
...and after it:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]
[ loopback { on | off } ]
[ listen-only { on | off } ]
[ triple-sampling { on | off } ]
[ one-shot { on | off } ]
[ berr-reporting { on | off } ]
[ fd { on | off } ]
[ fd-non-iso { on | off } ]
[ presume-ack { on | off } ]
[ cc-len8-dlc { on | off } ]
[ restart-ms TIME-MS ]
[ restart ]
[ termination { 0..65535 } ]
Where: BITRATE := { NUMBER in bps }
SAMPLE-POINT := { 0.000..0.999 }
TQ := { NUMBER in ns }
PROP-SEG := { NUMBER in tq }
PHASE-SEG1 := { NUMBER in tq }
PHASE-SEG2 := { NUMBER in tq }
SJW := { NUMBER in tq }
RESTART-MS := { 0 | NUMBER in ms }
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
cc0356d6a02e ("Merge tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch is fixing a bug, when param set user command includes
configuration mode which is not supported, the tool may not respond
with error if the requested value is 0. In such case
cmd_dev_param_set_cb() won't find the requested configuration mode and
returns ctx->value as initialized (equal 0). Then cmd_dev_param_set()
may find that requested value equals current value and returns success.
Fixing the bug by adding a flag cmode_found which is set only if
cmd_dev_param_set_cb() finds the requested configuration mode.
Fixes: 13925ae9eb ("devlink: Add param command support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When configuring a devlink PCI port, the pfnumber can be specified
using 'pfnum' and not 'pcipf' as stated in the man page. Fix this.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Daniel Borkmann says:
====================
iproute2 patches to add support for managed neighbor entries as per recent
net-next commits:
2ed08b5ead3c ("Merge branch 'Managed-Neighbor-Entries'")
c47fedba94bc ("Merge branch 'minor-managed-neighbor-follow-ups'")
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Currently, ip neigh does not support the NTF_EXT_MANAGED flag. Add cmdline
support.
Usage example:
# ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
# ./ip/ip n
192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
Currently, ip neigh does not support the NTF_USE flag. Similar to other flags
such as extern_learn, add cmdline support. The flag dump support is explicitly
missing here, since the kernel does not propagate the flag back to user space.
Usage example:
# ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
# ./ip/ip n
192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
Fix up spacing to consistently add a single ' ' after an attribute has
been printed. Currently, it is a bit of a mix of before and after which
can lead to double spacing to be printed.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
Two new commands to manage default policies:
- ip xfrm policy setdefault
- ip xfrm policy getdefault
And the corresponding part in 'ip xfrm monitor'.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Mark Zhang says:
====================
This is supplementary part of kernel series [1], which provides an
extension to the rdma statistics tool that allows to set or list
optional counters dynamically, using netlink.
Thanks
[1] https://www.spinics.net/lists/linux-rdma/msg106283.html
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
See kernel
Commit 844f7eaaed9 ("include/uapi/linux/xfrm.h: Fix XFRM_MSG_MAPPING ABI breakage")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi says:
====================
This series add support for the libdir parameter in iproute2 configure
script. The idea is to make use of the fact that packaging systems may
assume that 'configure' comes from autotools allowing a syntax similar
to the autotools one, and using it to tell iproute2 where the distro
expects to find its lib files.
Patches 1-2 fix a parsing issue on current configure options, that may
trigger an endless loop when no value is provided with some options;
Patch 3 fixes a parsing issue bailing out when more than one value is
provided for a single option;
Patch 4 simplifies options parsing, moving semantic checks out of the
while loop processing options;
Patch 5 introduces support for the --opt=value style on current options,
for uniformity;
Patch 6 adds the --prefix option, that may be used by some packaging
systems when calling the configure script;
Patch 7 finally adds the --libdir option, and also drops the static
LIBDIR var from the Makefile.
Changelog:
----------
v4 -> v5
- bail out when multiple values are provided with a single option
- simplify option parsing and reduce code duplication, as suggested
by Phil Sutter
- remove a nasty eval on libdir option processing
v3 -> v4
- fix parsing issue on '--include_dir' and '--libbpf_dir'
- split '--opt value' and '--opt=value' use cases, avoid code
duplication moving semantic checks on value to dedicated functions
v2 -> v3
- fix parsing error on prefix and libdir options.
v1 -> v2
- consolidate '--opt value' and '--opt=value' use cases, as suggested
by David Ahern.
- added patch 2 to manage the --prefix option, used by the Debian
packaging system, as reported by Luca Boccassi, and use it when
setting lib directory.
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
This commit allows users/packagers to choose a lib directory to store
iproute2 lib files.
At the moment iproute2 ship lib files in /usr/lib and offers no way to
modify this setting. However, according to the FHS, distros may choose
"one or more variants of the /lib directory on systems which support
more than one binary format" (e.g. /usr/lib64 on Fedora).
As Luca states in commit a3272b9372 ("configure: restore backward
compatibility"), packaging systems may assume that 'configure' is from
autotools, and try to pass it some parameters.
Allowing the '--libdir=/path/to/libdir' syntax, we can use this to our
advantage, and let the lib directory to be chosen by the distro
packaging system.
Note that LIBDIR uses "\${prefix}/lib" as default value because autoconf
allows this to be expanded to the --prefix value at configure runtime.
"\${prefix}" is replaced with the PREFIX value in check_lib_dir().
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
This commit add the '--prefix' option to the iproute2 configure script.
This mimics the '--prefix' option that autotools configure provides, and
will be used later to allow users or packagers to set the lib directory.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
This commit makes it possible to specify values for configure params
using the common autotools configure syntax '--param=value'.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
This commit simplifies options parsing moving all the code not related to
parsing out of the case statement.
- The conditional shift after the assignments is moved right after the
case, reducing code duplication.
- The semantic checks on the LIBBPF_FORCE value is moved after the loop
like we already did for INCLUDE and LIBBPF_DIR.
- Finally, the loop condition is changed to check remaining arguments, thus
making it possible to get rid of the null string case break.
As a bonus, now the help message states that on or off should follow
--libbpf_force
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
With commit a9c3d70d90 ("configure: add options ability") users are no
more able to provide wrong command lines like:
$ ./configure --include_dir foo bar
The script simply bails out when user provides more than one value for a
single option. However, in doing so, it breaks backward compatibility with
some packaging system, which expects unknown options to be ignored.
Commit a3272b9372 ("configure: restore backward compatibility") fix this
issue, but makes it possible again for users to provide wrong command lines
such as the one above.
This fixes the issue simply ignoring autoconf-like options such as
'--opt=value'.
Fixes: a3272b9372 ("configure: restore backward compatibility")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
configure is stuck in an endless loop if '--libbpf_dir' option is used
without a value:
$ ./configure --libbpf_dir
./configure: line 515: shift: 2: shift count out of range
./configure: line 515: shift: 2: shift count out of range
[...]
Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.
A check is also provided after the while loop to verify the libbpf dir
exists; also, as LIBBPF_DIR does not have a default value, configure bails
out if the user does not specify a value after --libbpf_dir, thus avoiding
to produce an erroneous configuration.
Fixes: 7ae2585b86 ("configure: convert LIBBPF environment variables to command-line options")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
configure is stuck in an endless loop if '--include_dir' option is used
without a value:
$ ./configure --include_dir
./configure: line 506: shift: 2: shift count out of range
./configure: line 506: shift: 2: shift count out of range
[...]
Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.
A check is also provided after the while loop to verify the include dir
exists; this avoid to produce an erroneous configuration.
Fixes: a9c3d70d90 ("configure: add options ability")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides an extension to the rdma statistics tool
that allows to set/unset optional counters set dynamically,
using new netlink commands.
Note that the optional counter statistic implementation is
driver-specific and may impact the performance.
Examples:
To enable a set of optional counters on link rocep8s0f0/1:
$ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts
To disable all optional counters on link rocep8s0f0/1:
$ sudo rdma statistic unset link rocep8s0f0/1 optional-counters
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch introduces the "mode" command, which presents the enabled or
supported (when the "supported" argument is available) optional
counters.
An optional counter is a vendor-specific counter that may be
dynamically enabled/disabled. This enhancement of hwcounters allows
exposing of counters which are for example mutual exclusive and cannot
be enabled at the same time, counters that might degrades performance,
optional debug counters, etc.
Examples:
To present currently enabled optional counters on link rocep8s0f0/1:
$ rdma statistic mode link rocep8s0f0/1
link rocep8s0f0/1 optional-counters cc_rx_ce_pkts
To present supported optional counters on link rocep8s0f0/1:
$ rdma statistic mode supported link rocep8s0f0/1
link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update rdma_netlink.h file upto kernel commit 7301d0a9834c
("RDMA/nldev: Add support to get status of all counters")
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
David reported ipmptcp breaks hard the build when updating the
relevant kernel headers.
We should be more careful in the header section, explicitly
including all the required dependencies respecting the usual order
between systems and local headers.
Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
When creating map-in-maps, the outer map can be prepopulated using the
inner_idx field of inner maps. That field defines the index of the inner
map in the outer map. It is ignored if set to -1.
Commit 6d61a2b557 ("lib: add libbpf support") however started using
that field to identify inner maps. While iterating over all maps looking
for inner maps, maps with inner_idx set to -1 are erroneously skipped.
As a result, trying to create a map-in-map with prepopulation disabled
fails because the inner_id of the outer map is not correctly set.
This bug can be observed with strace -ebpf (notice the zero inner_map_fd
for the outer map creation):
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=130996, max_entries=1, map_flags=0, inner_map_fd=0, map_name="maglev_inner", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = 32
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=2, value_size=4, max_entries=65536, map_flags=BPF_F_NO_PREALLOC, inner_map_fd=0, map_name="maglev_outer", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = -1 EINVAL (Invalid argument)
Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
br. were added between options of the same command. That is not needed
and makes the output to be one 3 lines for no particular reason.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Values should be .I, square brackets should be used for optional values,
curly brackets for lists. Follow this in the devlink-port man page.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When configuring a devlink PCI SF port, the sfnumber can be specified
using 'sfnum' and not 'pcisf' as stated in the man page. Fix this.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Justin Iurman says:
====================
Following the series applied to net-next (see [1]), here are the corresponding
changes to iproute2.
In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.
This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:
- inline: directly inserts IOAM inside packets (by default).
- encap: ip6ip6 encapsulation of IOAM inside packets.
- auto: either inline mode for packets generated locally or encap mode for
in-transit packets.
With current iproute2 implementation, it is configured this way:
$ ip -6 r [...] encap ioam6 trace prealloc [...]
The old syntax does not change (for backwards compatibility) and implicitly uses
the inline mode. With the new syntax, an encap mode can be specified:
(inline mode)
$ ip -6 r [...] encap ioam6 mode inline trace prealloc [...]
(encap mode)
$ ip -6 r [...] encap ioam6 mode encap tundst fc00::2 trace prealloc [...]
(auto mode)
$ ip -6 r [...] encap ioam6 mode auto tundst fc00::2 trace prealloc [...]
A tunnel destination address must be configured when using the encap mode or the
auto mode.
[1] https://lore.kernel.org/netdev/163335001045.30570.12527451523558030753.git-patchwork-notify@kernel.org/T/#m3b428d4142ee3a414ec803466c211dfdec6e0c09
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch updates the IOAM documentation (ip-route man page) to reflect the
three encap modes that were introduced.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch adds support for the three IOAM encap modes that were introduced:
inline, encap and auto.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
Fix rogue "tab after spaces" used for indentation of the documentation.
This causes rendering issues on terminals using a non-standard tab width.
Signed-off-by: Frank Villaro-Dixon <frank.villaro@infomaniak.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Since we use the cache netlink socket for each nexthop we can keep it open
instead of opening and closing it on every add call. The socket is opened
once, on the first add call and then reused for the rest.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Recently the kernel gained ability to report the maximum number of
snapshots a region can have. Print this value out if it was reported.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
49ed8dde3715 ("net: usb: use eth_hw_addr_set() for dev->addr_len cases")
Update to linux/mptcp.h is removed because it breaks compilation
of ipmptcp.c in a nontrivial way.
Signed-off-by: David Ahern <dsahern@kernel.org>
the following command:
# ip -j mptcp endpoint show
prints a JSON array that misses the terminating bracket. Fix this calling
delete_json_obj() to balance the call to new_json_obj().
Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Nikolay Aleksandrov says:
====================
This set tries to help with an old ask that we've had for some time
which is to print nexthop information while monitoring or dumping routes.
The core problem is that people cannot follow nexthop changes while
monitoring route changes, by the time they check the nexthop it could be
deleted or updated to something else. In order to help them out I've
added a nexthop cache which is populated (only used if -d / show_details
is specified) while decoding routes and kept up to date while monitoring.
The nexthop information is printed on its own line starting with the
"nh_info" attribute and its embedded inside it if printing JSON. To
cache the nexthop entries I parse them into structures, in order to
reuse most of the code the print helpers have been altered so they rely
on prepared structures. Nexthops are now always parsed into a structure,
even if they won't be cached, that structure is later used to print the
nexthop and destroyed if not going to be cached. New nexthops (not found
in the cache) are retrieved from the kernel using a private netlink
socket so they don't disrupt an ongoing dump, similar to how interfaces
are retrieved and cached.
I have tested the set with the kernel forwarding selftests and also by
stressing it with nexthop create/update/delete in loops while monitoring.
Comments are very welcome as usual. :)
Changes since RFC:
- reordered parse/print splits, in order to do that I have to parse
resilient groups first, then add nh entry parsing so code has been
reordered as well and patch order has changed, but there have been
no functional changes (as before refactoring of old code is done in
the first 8 patches and then patches 9-12 add the new cache and use it)
- re-run all tests above
Patch breakdown:
Patches 1-2: update current route helpers to take parsed arguments so we
can directly pass them from the nh_entry structure later
Patch 3: adds new nha_res_grp structure which describes a resilient
nexhtop group
Patch 4: splits print_nh_res_group into a parse and print parts
which use the new nha_res_grp structure
Patch 5: adds new nh_entry structure which describes a nexthop
Patch 6: factors out print_nexthop's attribute parsing into nh_entry
structure used before printing
Patch 7: factors out print_nexthop's nh_entry structure printing
Patch 8: factors out ipnh_get's rtnl talk part and allows to use a
different rt handle for the communication
Patch 9: adds nexthop cache and helpers to manage it, it uses the
new __ipnh_get to retrieve nexthops
Patch 10: adds a new helper print_cache_nexthop_id that prints nexthop
information from its id, if the nexthop is not found in the
cache it fetches it
Patch 11: the new print_cache_nexthop_id helper is used when printing
routes with show_details (-d) to output detailed nexthop
information, the format after nh_info is the same as
ip nexthop show
Patch 12: changes print_nexthop into print_cache_nexthop which always
outputs the nexthop information and can also update the cache
(based on process_cache argument), it's used to keep the
cache up to date while monitoring
Example outputs (monitor):
[NEXTHOP]id 101 via 169.254.2.22 dev veth2 scope link proto unspec
[NEXTHOP]id 102 via 169.254.3.23 dev veth4 scope link proto unspec
[NEXTHOP]id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 192.0.2.0/24 nhid 203 table 4 proto boot scope global
nh_info id 203 group 201/202 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
nexthop via 169.254.2.12 dev veth3 weight 1
nexthop via 169.254.3.13 dev veth5 weight 1
[NEXTHOP]id 204 via fe80:2::12 dev veth3 scope link proto unspec
[NEXTHOP]id 205 via fe80:3::13 dev veth5 scope link proto unspec
[NEXTHOP]id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 2001:db8:1::/64 nhid 206 table 4 proto boot scope global metric 1024 pref medium
nh_info id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
nexthop via fe80:2::12 dev veth3 weight 1
nexthop via fe80:3::13 dev veth5 weight 1
[NEXTHOP]id 2 encap mpls 200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
[ROUTE]unicast 2.3.4.10 nhid 2 table main proto boot scope global
nh_info id 2 encap mpls 200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
JSON:
{
"type": "unicast",
"dst": "198.51.100.0/24",
"nhid": 103,
"table": "3",
"protocol": "boot",
"scope": "global",
"flags": [ ],
"nh_info": {
"id": 103,
"group": [ {
"id": 101,
"weight": 11
},{
"id": 102,
"weight": 45
} ],
"type": "resilient",
"resilient_args": {
"buckets": 512,
"idle_timer": 0,
"unbalanced_timer": 0,
"unbalanced_time": 0
},
"scope": "global",
"protocol": "unspec",
"flags": [ ]
},
"nexthops": [ {
"gateway": "169.254.2.22",
"dev": "veth2",
"weight": 11,
"flags": [ ]
},{
"gateway": "169.254.3.23",
"dev": "veth4",
"weight": 45,
"flags": [ ]
} ]
}
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a new helper print_cache_nexthop replacing print_nexthop which can
update the nexthop cache if the process_cache argument is true. It is
used when monitoring netlink messages to keep the nexthop cache up to
date with nexthop changes happening. For the old callers and anyone
who's just dumping nexthops its _nocache version is used which is a
wrapper for print_cache_nexthop.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
If -d (show_details) is used when printing/monitoring routes then print
detailed nexthop information in the field "nh_info". The nexthop is also
cached for future searches.
Output looks like:
unicast 198.51.100.0/24 nhid 103 table 3 proto boot scope global
nh_info id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
nexthop via 169.254.2.22 dev veth2 weight 1
nexthop via 169.254.3.23 dev veth4 weight 1
The nh_info field has the same format as ip -d nexthop show would've had
for the same nexthop id.
For completeness the JSON version looks like:
{
"type": "unicast",
"dst": "198.51.100.0/24",
"nhid": 103,
"table": "3",
"protocol": "boot",
"scope": "global",
"flags": [ ],
"nh_info": {
"id": 103,
"group": [ {
"id": 101
},{
"id": 102
} ],
"type": "resilient",
"resilient_args": {
"buckets": 512,
"idle_timer": 0,
"unbalanced_timer": 0,
"unbalanced_time": 0
},
"scope": "global",
"protocol": "unspec",
"flags": [ ]
},
"nexthops": [ {
"gateway": "169.254.2.22",
"dev": "veth2",
"weight": 1,
"flags": [ ]
},{
"gateway": "169.254.3.23",
"dev": "veth4",
"weight": 1,
"flags": [ ]
} ]
}
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a helper which looks for a nexthop in the cache and if not found
reads the entry from the kernel and caches it. Finally the entry is
printed.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a static nexthop cache in a hash with 1024 buckets and helpers to
manage it (link, unlink, find, add nexthop, del nexthop). Adding new
nexthops is done by creating a new rtnl handle and using it to retrieve
the nexthop so the helper is safe to use while already reading a
response (i.e. using the global rth).
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Factor out ipnh_get_id's rtnl talk portion into a separate helper which
will be reused later to retrieve nexthops for caching.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Factor out nexthop entry structure printing from print_nexthop,
effectively splitting it into parse and print parts.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Factor out the nexthop attribute parsing and parse attributes into a
nexthop entry structure which is then used to print.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a structure which describes a nexthop, it will be later used to
parse, print and cache nexthops.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Now that we have resilient group structure split print_nh_res_group into
a parse and print functions, print_nexthop calls the parse function
first to parse the attributes into the structure and then uses the print
function to print the parsed structure.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a structure which describes a resilient nexthop group. It will be
later used for parsing.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Export a new __print_rta_gateway that takes a prepared gateway string to
print which is also used by print_rta_gateway for consistent format.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
We need print_rta_if() to take ifindex directly so later we can use it
with cached converted nexthop objects.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Ralf Baechle says:
====================
net-tools contain support for these three protocol but are deprecated and
no longer installed by default by many distributions. Iproute2 otoh has
no support at all and will dump the addresses of these protocols which
actually are pretty human readable as hex numbers:
# ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ax25 88:98:60:a0:92:40:02 brd a2:a6:a8:40:40:40:00
# ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/netrom 88:98:60:a0:92:40:0a brd 00:00:00:00:00:00:00
# ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/rose 65:09:33:30:00 brd 00:00:00:00:00
This series adds basic support for the three protocols to print addresses:
# ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ax25 DL0PI-1 brd QST-0
# ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/netrom DL0PI-5 brd *
# ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/rose 6509333000 brd 0000000000
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
NETROM is a OSI layer 3 protocol sitting on top of AX.25. It uses BCD-
encoded 10 digit telephone numbers as addresses. Without this ip will
print a ROSE addresses like
link/rose 12:34:56:78:90 brd 00:00:00:00:00
which is readable but ugly. With this applied it ROSE addresses will be
printed as
link/rose 1234567890 brd 0000000000
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
ROSE addresses are ten digit numbers, basically like North American
telephone numbers.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
NETROM is an OSI layer 3 protocol sitting on top of AX.25. It also uses
AX.25 addresses. Without this commit ip will print NETROM address like
link/generic 98:92:9c:aa:b0:40:02 brd 00:00:00:00:00:00:00
while with this commit the decoded result
link/generic LINUX-1 brd *
is much more eye friendly.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
NETROM uses AX.25 addresses so this is a simple wrapper around ax25_ntop1.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Before this, ip would have printed the AX.25 address configured for an
AX.25 interface's default addresses as:
link/ax25 98:92:9c:aa:b0:40:02 brd a2:a6:a8:40:40:40:00
which is pretty unreadable. With this commit ip will decode AX.25
addresses like
link/ax25 LINUX-1 brd QST-0
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
AX.25 addresses are based on Amateur radio callsigns followed by an SSID
like XXXXXX-SS where the callsign is up to 6 characters which are either
letters or digits and the SSID is a decimal number in the range 0..15.
Amateur radio callsigns are assigned by a country's relevant authorities
and are 3..6 characters though a few countries have assigned callsigns
longer than that. AX.25 is not able to handle such longer callsigns.
Being based on HDLC AX.25 encodes addresses by shifting them one bit left
thus zeroing bit 0, the HDLC extension bit for all but the last bit of
a packet's address field but for our purposes here we're not considering
the HDLC extension bit that is it will always be zero.
Linux' internal representation of AX.25 addresses in Linux is very similar
to this on the on-air or on-the-wire format. The callsign is padded to
6 octets by adding spaces, followed by the SSID octet then all 7 octets
are left-shifted by one byte.
This for example turns "LINUX-1" where the callsign is LINUX and SSID is 1
into 98:92:9c:aa:b0:40:02.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
bpf selftests using iproute2 fails with:
$ ip link set dev veth0 xdp object ../bpf/xdp_dummy.o section xdp_dummy
Continuing without mounted eBPF fs. Too old kernel?
mkdir (null)/globals failed: No such file or directory
Unable to load program
This happens when the /sys/fs/bpf directory exists. In this case, mkdir
in bpf_mnt_check_target() fails with errno == EEXIST, and the function
returns -1. Thus bpf_get_work_dir() does not call bpf_mnt_fs() and the
bpffs is not mounted.
Fix this in bpf_mnt_check_target(), returning 0 when the mountpoint
exists.
Fixes: d4fcdbbec9 ("lib/bpf: Fix and simplify bpf_mnt_check_target()")
Reported-by: Mingyu Shi <mshi@redhat.com>
Reported-by: Jiri Benc <jbenc@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Provided port range in tc rule are parsed incorrectly.
Even though range is passed as min-max. It throws an error.
$ tc filter add dev eth0 ingress handle 100 priority 10000 protocol ipv4 flower ip_proto tcp dst_port 10368-61000 action pass
max value should be greater than min value
Illegal "dst_port"
Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Puneet Sharma <pusharma@akamai.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The BPF program name is included when dumping the BPF program info and the
kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes for the program
name.
$ sudo ip link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff
prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited
The BPF program load time (ns since boottime), UID of the user who loaded
the program and the BTF ID are also included when dumping the BPF program
information when the user expects a detailed ip link info output.
$ sudo ip -details link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filt
ering 0 vlan_protocol 802.1Q bridge_id 8000.2:42:4c:df:a4:54 designated_root 8000.2:42:4c:df:a4:54 root_port 0 r
oot_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_chan
ge_timer 0.00 gc_timer 265.36 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask
0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast
_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_
interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query
_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_v
ersion 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues
1 gso_max_size 65536 gso_max_segs 65535
prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited load_time 2676682607316255 created_by_uid 0 btf_id 708
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Commit d3432bf10f17 ("net: Support filtering interfaces on no master")
in the kernel added support for filtering interfaces/neighbours that
have no master interface.
This patch completes it and adds this support to iproute2:
1. ip link show nomaster
2. ip address show nomaster
3. ip neighbour {show | flush} nomaster
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The 'ip link add' invocation template at the top of the ip-macsec man
page formats with a pair of extra double quotes:
ip link add link DEVICE name NAME type macsec [ [ address <lladdr> ]
port PORT | sci <u64> ] [ cipher { default | gcm-aes-128 | gcm-
aes-256"}][" icvlen ICVLEN ] [ encrypt { on | off } ] [ send_sci { on |
This is due to missing whitespace around the gcm-aes-256 identifier
in the source file.
Fixes: b16f525323 ("Add support for configuring MACsec gcm-aes-256 cipher type.")
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Nikolay Aleksandrov says:
====================
This set adds support for vlan port/bridge multicast router option. It is
similar to the already existing bridge-wide mcast_router control. Patch 01
moves attribute adding and parsing together for vlan option setting,
similar to global vlan option setting. It simplifies adding new options
because we can avoid reserved values and additional checks. Patch 02
adds the new mcast_router option and updates the related man page.
Example:
# mark port ens16 as a permanent mcast router for vlan 100
$ bridge vlan set dev ens16 vid 100 mcast_router 2
# disable mcast router for port ens16 and vlan 200
$ bridge vlan set dev ens16 vid 200 mcast_router 0
$ bridge -d vlan show
port vlan-id
ens16 1 PVID Egress Untagged
state forwarding mcast_router 1
100
state forwarding mcast_router 2
200
state forwarding mcast_router 0
Note that this set depends on the latest kernel uapi headers.
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for setting and dumping per-vlan/interface mcast_router
option. It controls the mcast router mode of a vlan/interface pair.
For bridge devices only modes 0 - 2 are allowed. The possible modes
are:
0 - disabled
1 - automatic router presence detection (default)
2 - permanent router
3 - temporary router (available only for ports)
Example:
# mark port ens16 as a permanent mcast router for vlan 100
$ bridge vlan set dev ens16 vid 100 mcast_router 2
# disable mcast router for port ens16 and vlan 200
$ bridge vlan set dev ens16 vid 200 mcast_router 0
$ bridge -d vlan show
port vlan-id
ens16 1 PVID Egress Untagged
state forwarding mcast_router 1
100
state forwarding mcast_router 2
200
state forwarding mcast_router 0
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Set vlan option attributes immediately while parsing to simplify the
checks, avoid having reserved values (e.g. -1 for unset var) and have
more limited scope for the variables. This is also similar to how global
vlan options are set. The attribute setting and checks are moved with
option parsing, no functional changes intended.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
27151f177827 ("Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux")
Signed-off-by: David Ahern <dsahern@kernel.org>
Not sure if anyone uses the routel script. The script was
a combination of ip route, shell and awk doing command scraping.
It is now possible to do this much better using the JSON
output formats and python.
Rewriting also fixes the bug where the old script could not parse
the current output format. At the end was getting:
/usr/bin/routel: 48: shift: can't shift that many
The new script also has IPv6 as option.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
This script is old and limited to IPv4.
Using ip route command directly is better option.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
This script was from olden days of ifcfg.
I don't see any distribution using it and it is time to put
it out to pasture.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
This script was a one off hack for a special case.
Now that ip commands have better formatting, there is no
real reason for it.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
When creating a tap with multi_queue flag, this flag is not displayed
when dumping:
$ ip tuntap add tap23 mode tap multi_queue
$ ip tuntap
tap23: tap persist0x100
While at it, add a space between known flags and hexdump of unknown
ones.
Fixes: c41e038f48 ("iptuntap: allow creation of multi-queue tun/tap device")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit a9c3d70d90 broke backward compatibility
by making 'configure' error out if parameters are passed, instead of
ignoring them.
Sometimes packaging systems detect 'configure' and assume it's from
autotools, and pass a bunch of options. Eg:
dh_auto_configure
./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking
Ignore unknown options again instead of erroring out.
Fixes: a9c3d70d90 ("configure: add options ability")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Iproute2 has not supported DECnet or IPX since version 5.0.
There were some leftover support in the ip options flags
and parsing, remove these.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".
v2: use strcmp instead of match for new options.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.
This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").
Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch addresses Stephen's comment:
"""
> + print_null(PRINT_ANY, "", "\n", NULL);
Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""
It also removes two useless print_null's.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.
The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:
0b00: "Non ECN-Capable Transport", Non-ECT
0b10: "ECN Capable Transport", ECT(0)
0b01: "ECN Capable Transport", ECT(1)
0b11: "Congestion Encountered", CE
This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting. For example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod \
ecn
The updated tc-skbmod SYNOPSIS looks like the following:
tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...
Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command. Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.
Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".
[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:
$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0
where:
- "trace" and "prealloc" may appear as useless but just anticipate for future
implementations of other ioam option types.
- "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
the corresponding uapi).
- "ns" is an IOAM namespace ID attached to the pre-allocated trace.
- "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
limited size (see IOAM6_TRACE_DATA_SIZE_MAX).
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).
$ ip ioam
Usage: ip ioam { COMMAND | help }
ip ioam namespace show
ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
ip ioam namespace del ID
ip ioam schema show
ip ioam schema add ID DATA
ip ioam schema del ID
ip ioam namespace set ID schema { ID | none }
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")
Signed-off-by: David Ahern <dsahern@kernel.org>
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.
$ ip -br neigh
172.16.12.100 bridge0 b0:fc:36:2f:07:43
172.16.12.174 bridge0 8c:16:45:2f:bc:1c
172.16.12.250 bridge0 04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d bridge0 b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c bridge0 8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74 bridge0 04:d9:f5:c1:0c:74
And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Nikolay Aleksandrov says:
====================
This set adds support for vlan multicast options. The feature is
globally controlled by a new bridge option called mcast_vlan_snooping
which is added by patch 01. Then patches 2-5 add support for dumping
global vlan options and filtering on vlan id. Patch 06 adds support for
setting global vlan options and then patches 07-18 add all the new
global vlan options, finally patch 19 adds support for dumping vlan
multicast router ports. These options are identical in meaning, names and
functionality as the bridge-wide ones.
All the new vlan global commands are under the global keyword:
$ bridge vlan global show [ vid VID dev DEVICE ]
$ bridge vlan global set vid VID dev DEVICE ...
I've added command examples in each commit message. The patch-set is a
bit bigger but the global options follow the same pattern so I don't see
a point in breaking them. All man page descriptions have been taken from
the same current bridge-wide mcast options. The only additional iproute2
change which is left to do is the per-vlan mcast router control which
I'll send separately. Note to properly use this set you'll need the
updated kernel headers where mcast router was moved from a global option
to per-vlan/per-device one (changed uapi enum which was in net-next).
Example:
# enable vlan mcast snooping globally
$ ip link set dev bridge type bridge mcast_vlan_snooping 1
# enable mcast querier on vlan 100
$ bridge vlan global set dev bridge vid 100 mcast_querier 1
# show vlan 100's global options
$ bridge -s vlan global show vid 100
port vlan-id
bridge 100
mcast_snooping 1 mcast_querier 1 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
A following kernel patch-set will add selftests which use these commands.
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Add dump support for vlan multicast router ports and their details if
requested. If details are requested we print 1 entry per line, otherwise
we print all router ports on a single line similar to how mdb prints
them.
Looks like:
$ bridge vlan global show vid 100
port vlan-id
bridge 100
mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
router ports: ens20 ens16
Looks like (with -s):
$ bridge -s vlan global show vid 100
port vlan-id
bridge 100
mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
router ports: ens20 187.57 temp
ens16 118.27 temp
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_querier option which
controls if the bridge will act as a multicast querier for that vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_querier 1
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_startup_query_interval
option which controls the interval between queries in the startup phase.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when setting
it.
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_startup_query_interval 15000
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_query_response_interval
option which sets the Max Response Time/Maximum Response Delay for IGMP/MLD
queries sent by the bridge. To be consistent with the same bridge-wide
option the value is reported with USER_HZ granularity and the same
granularity is expected when setting it.
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_query_response_interval 13000
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_query_interval
option which controls the interval between queries sent by the bridge
after the end of the startup phase. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_query_interval 13000
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_querier_interval
option which controls the interval after which if no other router
queries are seen the bridge will start sending its own queries.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when
setting it.
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_querier_interval 13000
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_membership_interval
option which controls the interval after which the bridge will leave a
group if no reports have been received for it. To be consistent with the
same bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 26000 (260 seconds).
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_membership_interval 13000
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_last_member_interval
option which controls the interval between queries to find remaining
members of a group after a leave message. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 100 (1 second).
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_last_member_interval 200
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_startup_query_count
option which controls the number of queries the bridge will send on the
vlan during startup phase (default 2).
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_startup_query_count 5
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_last_member_count option
which controls the number of queries the bridge will send on the vlan after
a leave is received (default 2).
Syntax:
$ bridge vlan global set dev bridge vid 1 mcast_last_member_count 10
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_mld_version option
which controls the MLD version on the vlan (default 1).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_mld_version 2
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_igmp_version option
which controls the IGMP version on the vlan (default 2).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_igmp_version 3
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add control and dump support for the global mcast_snooping option which
controls if multicast snooping is enabled or disabled for a single vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_snooping 1
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support to change global vlan options via a new vlan global
set subcommand similar to the current vlan set subcommand. The man page
and help are updated accordingly. The command works only with bridge
devices. It doesn't support any options yet.
Syntax: $ bridge vlan global set vid VID dev DEV
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In order to allow vlan filtering when dumping options we need to move
all print operations into the option dumping functions and add the
filtering after we've parsed the nested attributes so we can extract the
start and end vlan ids.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for new bridge vlan command grouping called global which
operates on global options. The first command it supports is "show".
To do that we update print_vlan_rtm to recognize the global vlan options
attribute and parse it properly.
Man page and help are also updated with the new command.
Syntax is: $ bridge vlan global show [ vid VID ] [ dev DEV ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Skip unknown attributes when printing vlan options in print_vlan_rtm.
Make sure print_vlan_opts doesn't accept attributes it doesn't understand.
Currently we print only one type, later global vlan options support will
be added.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Factor out the code which prints current per-vlan options from
print_vlan_rtm without any changes, later we'll filter based on the vlan
attribute and add support for global vlan option printing.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for mcast_vlan_snooping option which controls per-vlan
multicast snooping, also update the man page.
Syntax: $ ip link set dev bridge type bridge mcast_vlan_snooping 0/1
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Jonas reports that ss -awp does not display any RAW sockets
on a Knoppix 4.4 kernel.
sockdiag_send() diverts to tcpdiag_send() to try the older
netlink interface. tcpdiag_send() works for TCP and DCCP
but not other protocols. Instead of rejecting unsupported
protocols (and missing RAW and SCTP) match on supported ones.
Link: https://lore.kernel.org/netdev/20210815231738.7b42bad4@mmluhan/
Reported-and-tested-by: Jonas Bechtel <post@jbechtel.de>
Fixes: 41fe6c34de ("ss: Add inet raw sockets information gathering via netlink diag interface")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: 3a1ca9a5b ("bridge: update man page for new color and json changes")
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
To be consistent with the colorized output of "ip" command and to increase
readability, stop highlighting the "dev" & "dst" keywords in the colorized
output of "bridge -c fdb" cmd.
Example: in the following "bridge -c fdb" entry, only "00:00:00:00:00:00",
"vxlan100" and "2001:db8:2::1" fields should be highlighted in color.
00:00:00:00:00:00 dev vxlan100 dst 2001:db8:2::1 self permanent
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
As per the man/man8/bridge.8 page, the shorthand cmd line arg "-c" can be
used to colorize the bridge cmd output. But while parsing the args in while
loop, matches() detects "-c" as "-compressedvlans" instead of "-color", so
fix this by doing the check for "-color" option first before checking for
"-compressedvlans".
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add arp_validate filter support based on kernel commit 896149ff1b2c
("bonding: extend arp_validate to be able to receive unvalidated arp-only traffic")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Port function state can have either of the two values - active or
inactive. Update the documentation and help command for these two
values to tell user about it.
With the introduction of state, hw_addr and state are optional.
Hence mark them as optional in man page that also aligns with the help
command output.
Fixes: bdfb9f1bd6 ("devlink: Support set of port function state")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".
v2: use strcmp instead of match for new options.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.
This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").
Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch addresses Stephen's comment:
"""
> + print_null(PRINT_ANY, "", "\n", NULL);
Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""
It also removes two useless print_null's.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.
While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.
Fixes: 267480f553 ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The value of s used inside the cycle is the result of strstr(), so this
assignment is useless.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If bpf_map_fetch_name() returns NULL, strlen() hits a NULL-pointer
dereference on outer_map_name.
Fix this checking outer_map_name value, and returning false when NULL,
as already done for inner_map_name before.
Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When processing device flash update, cmd_dev_flash function waits until
the flash process has completed. This requires the following two
conditions to both be true:
a) we've received an exit status from the child process
b) we've received the DEVLINK_CMD_FLASH_UPDATE_END *or*
we haven't received any status notifications from the driver.
The original devlink flash status monitoring code in 9b13cddfe2
("devlink: implement flash status monitoring") was written assuming that
a driver will either send no status updates, or it will send at least
one DEVLINK_CMD_FLASH_UPDATE_STATUS before DEVLINK_CMD_FLASH_UPDATE_END.
Newer versions of the kernel since commit 52cc5f3a166a ("devlink: move flash
end and begin to core devlink") in v5.10 moved handling of the
DEVLINK_CMD_FLASH_UPDATE_END into the core stack, and will send this
regardless of whether or not the driver sends any of its own status
notifications.
The handling of DEVLINK_CMD_FLASH_UPDATE_END in cmd_dev_flash_status_cb
has an additional condition that it must not be the first message.
Otherwise, it falls back to treating it like
a DEVLINK_CMD_FLASH_UPDATE_STATUS.
This is wrong because it can lead to an infinite loop if a driver does
not send any status updates.
In this case, the kernel will send DEVLINK_CMD_FLASH_UPDATE_END without
any DEVLINK_CMD_FLASH_UPDATE_STATUS. The devlink application will see
that ctx->not_first is false, and will treat this like any other status
message. Thus, ctx->not_first will be set to 1.
The loop condition to exit flash update will thus never be true, since
we will wait forever, because ctx->not_first is true, and
ctx->received_end is false.
This leads to the application appearing to process the flash update, but
it will never exit.
Fix this by simply always treating DEVLINK_CMD_FLASH_UPDATE_END the same
regardless of whether its the first message or not.
This is obviously the correct thing to do: once we've received the
DEVLINK_CMD_FLASH_UPDATE_END the flash update must be finished. For new
kernels this is always true, because we send this message in the core
stack after the driver flash update routine finishes.
For older kernels, some drivers may not have sent any
DEVLINK_CMD_FLASH_UPDATE_STATUS or DEVLINK_CMD_FLASH_UPDATE_END. This is
handled by the while loop conditional that exits if we get a return
value from the child process without having received any status
notifications.
An argument could be made that we should exit immediately when we get
either the DEVLINK_CMD_FLASH_UPDATE_END or an exit code from the child
process. However, at a minimum it makes no sense to ever process
DEVLINK_CMD_FLASH_UPDATE_END as if it were a DEVLINK_CMD_FLASH_UPDATE_STATUS.
This is easy to test as it is triggered by the selftests for the
netdevsim driver, which has a test case for both with and without status
notifications.
Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Use tc with no verbose, when bpf_btf_attach fail,
the conditions:
"if (fd < 0 && (errno == ENOSPC || !ctx->log_size))"
will make ctx->log_size != 0. And then, bpf_prog_attach,
ctx->log_size != 0. so enable debug log.
The verifier log sometimes is so chatty on larger programs.
bpf_prog_attach is failed.
"Log buffer too small to dump verifier log 16777215 bytes (9 tries)!"
BTF load failure does not affect prog load. prog still work.
So when BTF/PROG load fail, enlarge log_size and re-fail with
having verbose.
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.
The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:
0b00: "Non ECN-Capable Transport", Non-ECT
0b10: "ECN Capable Transport", ECT(0)
0b01: "ECN Capable Transport", ECT(1)
0b11: "Congestion Encountered", CE
This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting. For example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod \
ecn
The updated tc-skbmod SYNOPSIS looks like the following:
tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...
Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command. Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.
Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".
[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Justin Iurman says:
====================
The IOAM patchset was merged recently (see net-next commits [1,2,3,4,5,6]).
Therefore, this patchset provides support for IOAM inside iproute2, as well as
manpage documentation. Here is a summary of added features inside iproute2.
(1) configure IOAM namespaces and schemas:
$ ip ioam
Usage: ip ioam { COMMAND | help }
ip ioam namespace show
ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
ip ioam namespace del ID
ip ioam schema show
ip ioam schema add ID DATA
ip ioam schema del ID
ip ioam namespace set ID schema { ID | none }
(2) provide a new encap type to insert the IOAM pre-allocated trace:
$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0
[1] db67f219fc9365a0c456666ed7c134d43ab0be8a
[2] 9ee11f0fff205b4b3df9750bff5e94f97c71b6a0
[3] 8c6f6fa6772696be0c047a711858084b38763728
[4] 3edede08ff37c6a9370510508d5eeb54890baf47
[5] de8e80a54c96d2b75377e0e5319a64d32c88c690
[6] 968691c777af78d2daa2ee87cfaeeae825255a58
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:
$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0
where:
- "trace" and "prealloc" may appear as useless but just anticipate for future
implementations of other ioam option types.
- "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
the corresponding uapi).
- "ns" is an IOAM namespace ID attached to the pre-allocated trace.
- "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
limited size (see IOAM6_TRACE_DATA_SIZE_MAX).
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).
$ ip ioam
Usage: ip ioam { COMMAND | help }
ip ioam namespace show
ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
ip ioam namespace del ID
ip ioam schema show
ip ioam schema add ID DATA
ip ioam schema del ID
ip ioam namespace set ID schema { ID | none }
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")
Signed-off-by: David Ahern <dsahern@kernel.org>
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.
$ ip -br neigh
172.16.12.100 bridge0 b0:fc:36:2f:07:43
172.16.12.174 bridge0 8c:16:45:2f:bc:1c
172.16.12.250 bridge0 04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d bridge0 b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c bridge0 8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74 bridge0 04:d9:f5:c1:0c:74
And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Currently man 8 tc-skbmod says that "...the swap action will occur after
any smac/dmac substitutions are executed, if they are present."
This is false. In fact, trying to "set" and "swap" in a single skbmod
command causes the "set" part to be completely ignored. As an example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
matchall action skbmod \
set dmac AA:AA:AA:AA:AA:AA smac BB:BB:BB:BB:BB:BB \
swap mac
The above command simply does a "swap", without setting DMAC or SMAC to
AA's or BB's. The root cause of this is in the kernel, see
net/sched/act_skbmod.c:tcf_skbmod_init():
parm = nla_data(tb[TCA_SKBMOD_PARMS]);
index = parm->index;
if (parm->flags & SKBMOD_F_SWAPMAC)
lflags = SKBMOD_F_SWAPMAC;
^^^^^^^^^^^^^^^^^^^^^^^^^^
Doing a "=" instead of "|=" clears all other "set" flags when doing a
"swap". Discourage using "set" and "swap" in the same command by
documenting it as undefined behavior, and update the "SYNOPSIS" section
as well as tc -help text accordingly.
If one really needs to e.g. "set" DMAC to all AA's then "swap" DMAC and
SMAC, one should do two separate commands and "pipe" them together.
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
With the json support fix the normal output was
changed. set it back to what it was.
Print overhead with print_size().
Print newline before ref.
Fixes: 0d5cf51e0d ("police: Add support for json output")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
A successful call to recvmsg() causes msg.msg_controllen to contain the length
of the received ancillary data. However, the current code in the 'ip' utility
doesn't reset this value after each recvmsg().
This means that if a call to recvmsg() doesn't have ancillary data, then
'msg.msg_controllen' will be set to 0, causing future recvmsg() which do
contain ancillary data to get MSG_CTRUNC set in msg.msg_flags.
This fixes 'ip monitor' running with the all-nsid option - With this option the
kernel passes the nsid as ancillary data. If while 'ip monitor' is running an
even on the current netns is received, then no ancillary data will be sent,
causing 'msg.msg_controllen' to be set to 0, which causes 'ip monitor' to
indefinitely print "[nsid current]" instead of the real nsid.
Fixes: 449b824ad1 ("ipmonitor: allows to monitor in several netns")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix nullptr dereference of errhndlr from rtnl_dump_filter_arg
struct in rtnl_dump_done and rtnl_dump_error functions.
Fixes: 459ce6e3d7 ("ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Roi Dayan <roid@nvidia.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Don't initialize arguments that are NULL, and format initialization
in a more logical way.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
We started to use in-kernel filtering feature which allows to get only
needed tables (see iproute_dump_filter()). From the kernel side it's
implemented in net/ipv4/fib_frontend.c (inet_dump_fib), net/ipv6/ip6_fib.c
(inet6_dump_fib). The problem here is that behaviour of "ip route save"
was changed after
c7e6371bc ("ip route: Add protocol, table id and device to dump request").
If filters are used, then kernel returns ENOENT error if requested table
is absent, but in newly created net namespace even RT_TABLE_MAIN table
doesn't exist. It is really allocated, for instance, after issuing
"ip l set lo up".
Reproducer is fairly simple:
$ unshare -n ip route save > dump
Error: ipv4: FIB table does not exist.
Dump terminated
Expected result here is to get empty dump file (as it was before this
change).
v2: reworked, so, now it takes into account NLMSGERR_ATTR_MSG
(see nl_dump_ext_ack_done() function). We want to suppress error messages
in stderr about absent FIB table from kernel too.
v3: reworked to make code clearer. Introduced rtnl_suppressed_errors(),
rtnl_suppress_error() helpers. User may suppress up to 3 errors (may be
easily extended by changing SUPPRESS_ERRORS_INIT macro).
v4: reworked, rtnl_dump_filter_errhndlr() was introduced. Thanks
to Stephen Hemminger for comments and suggestions
v5: space fixes, commit message reformat, empty initializers
Fixes: c7e6371bc ("ip route: Add protocol, table id and device to dump request")
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When BPF programs which consists of multiple executable sections via
iproute2+libbpf (configured with LIBBPF_FORCE=on), we noticed that a
wrong section can be attached to a device. E.g.:
# tc qdisc replace dev lxc_health clsact
# tc filter replace dev lxc_health ingress prio 1 \
handle 1 bpf da obj bpf_lxc.o sec from-container
# tc filter show dev lxc_health ingress filter protocol all
pref 1 bpf chain 0 filter protocol all pref 1 bpf chain 0
handle 0x1 bpf_lxc.o:[__send_drop_notify] <-- WRONG SECTION
direct-action not_in_hw id 38 tag 7d891814eda6809e jited
After taking a closer look into load_bpf_object() in lib/bpf_libbpf.c,
we noticed that the filter used in the program iterator does not check
whether a program section name matches a requested section name
(cfg->section). This can lead to a wrong prog FD being used to attach
the program.
Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
devlink currently uses "%lu" to format values of type uint64_t,
but on 32-bit architectures uint64_t is defined as unsigned
long long and this does not work correctly.
Fix this by using the standard macro PRIu64 instead.
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
devlink and vdpa use BIT() together with 64-bit flag fields. devlink
is already using bit numbers greater than 31 and so does not work
correctly on 32-bit architectures.
Fix this by making BIT() use uint64_t instead of unsigned long.
Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
On some systems we fail to link because of missing math lib.
add -lm to devlink.
LINK devlink
../lib/libutil.a(utils_math.o): In function `get_rate':
utils_math.c:(.text+0xcc): undefined reference to `floor'
../lib/libutil.a(utils_math.o): In function `get_size':
utils_math.c:(.text+0x384): undefined reference to `floor'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:16: devlink] Error 1
make: *** [Makefile:64: all] Error 2
Fixes: 6c70aca76e ("devlink: Add port func rate support")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Implement a decrement operation for ttl and hoplimit.
Since this is just syntactic sugar, it goes that:
tc filter add ... action pedit ex munge ip ttl dec ...
tc filter add ... action pedit ex munge ip6 hoplimit dec ...
is just a more readable version of this:
tc filter add ... action pedit ex munge ip ttl add 0xff ...
tc filter add ... action pedit ex munge ip6 hoplimit add 0xff ...
This feature was suggested by some pseudo tc examples in Mellanox's
documentation[1], but wasn't present in neither their mlnx-iproute2
nor iproute2.
Tested with skip_sw on Mellanox ConnectX-6 Dx.
[1] https://docs.mellanox.com/pages/viewpage.action?pageId=47033989
v3:
- Use dedicated flags argument in parse_cmd() (David Ahern)
- Minor rewording of the man page
v2:
- Fix whitespace issue (Stephen Hemminger)
- Add to usage info in explain()
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch just prepares the flags argument, so it's
available to the next patch.
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Signed-off-by: David Ahern <dsahern@kernel.org>
The WWAN subsystem has been extended to generalize the per data channel
network interfaces management. This change implements support for WWAN
links handling. And actively uses the earlier introduced ip-link
capability to specify the parent by its device name.
The WWAN interface for a new data channel should be created with a
command like this:
ip link add dev wwan0-2 parentdev wwan0 type wwan linkid 2
Where: wwan0 is the modem HW device name (should be taken from
/sys/class/wwan) and linkid is an identifier of the opened data
channel.
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for specifying a parent device (struct device) by its name
during the link creation and printing parent name in the links list.
This option will be used to create WWAN links and possibly by other
device classes that do not have a "natural parent netdev".
Add the parent device bus name printing for links list info
completeness. But do not add a corresponding command line argument, as
we do not have a use case for this attribute.
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The ip link property add/delete requires a device; but the
device argument was not show on the man page.
It is correct in the usage message.
Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
We introduce the new "End.DT46" action for supporting the SRv6 End.DT46
Behavior in iproute2.
The SRv6 End.DT46 Behavior, defined in RFC 8986 [1] section 4.8, can be
used to implement L3 VPNs based on Segment Routing over IPv6 networks in
multi-tenants environments and it is capable of handling both IPv4 and
IPv6 tenant traffic at the same time.
The SRv6 End.DT46 Behavior decapsulates the received packets and it
performs the IPv4 or IPv6 routing lookup in the routing table of the
tenant.
As for the End.DT4 and for the End.DT6 in VRF mode, the SRv6 End.DT46
Behavior leverages a VRF device in order to force the routing lookup into
the associated routing table using the "vrftable" attribute.
To make the End.DT46 work properly, it must be guaranteed that the
routing table used for routing lookup operations is bound to one and
only one VRF during the tunnel creation. Such constraint has to be
enforced by enabling the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the End.DT4 Behavior and for the
End.DT6 Behavior in VRF mode.
An SRv6 End.DT46 Behavior instance can be created as follows:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100
Standard Output:
$ ip -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100 metric 1024 pref medium
JSON Output:
$ ip -6 -j -p route show 2001:db8::1
[ {
"dst": "2001:db8::1",
"encap": "seg6local",
"action": "End.DT46",
"vrftable": 100,
"dev": "vrf100",
"metric": 1024,
"flags": [ ],
"pref": "medium"
} ]
This patch updates the route.8 man page and the ip route help with the
information related to End.DT46.
Considering that the same information was missing for the SRv6 End.DT4 and
the End.DT6 Behaviors, we have also added it.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
Dmytro Linkin says:
====================
Series implements devlink rate commands, which are:
- Dump particular or all rate objects (JSON or non-JSON)
- Add/Delete node rate object
- Set tx rate share/max values for rate object
- Set/Unset parent rate object for other rate object
Examples:
Display all rate objects:
# devlink port function rate show
pci/0000:03:00.0/1 type leaf parent some_group
pci/0000:03:00.0/2 type leaf tx_share 12Mbit
pci/0000:03:00.0/some_group type node tx_share 1Gbps tx_max 5Gbps
Display leaf rate object bound to the 1st devlink port of the
pci/0000:03:00.0 device:
# devlink port function rate show pci/0000:03:00.0/1
pci/0000:03:00.0/1 type leaf
Display node rate object with name some_group of the pci/0000:03:00.0
device:
# devlink port function rate show pci/0000:03:00.0/some_group
pci/0000:03:00.0/some_group type node
Display leaf rate object rate values using IEC units:
# devlink -i port function rate show pci/0000:03:00.0/2
pci/0000:03:00.0/2 type leaf 11718Kibit
Display pci/0000:03:00.0/2 leaf rate object as pretty JSON output:
# devlink -jp port function rate show pci/0000:03:00.0/2
{
"rate": {
"pci/0000:03:00.0/2": {
"type": "leaf",
"tx_share": 1500000
}
}
}
Create node rate object with name "1st_group" on pci/0000:03:00.0 device:
# devlink port function rate add pci/0000:03:00.0/1st_group
Create node rate object with specified parameters:
# devlink port function rate add pci/0000:03:00.0/2nd_group \
tx_share 10Mbit tx_max 30Mbit parent 1st_group
Set parameters to the specified leaf rate object:
# devlink port function rate set pci/0000:03:00.0/1 \
tx_share 2Mbit tx_max 10Mbit
Set leaf's parent to "1st_group":
# devlink port function rate set pci/0000:03:00.0/1 parent 1st_group
Unset leaf's parent:
# devlink port function rate set pci/0000:03:00.0/1 noparent
Delete node rate object:
# devlink port function rate del pci/0000:03:00.0/2nd_group
Rate values can be specified in bits or bytes per second (bit|bps), with
any SI (k, m, g, t) or IEC (ki, mi, gi, ti) prefix. Bare number means
bits per second. Units also printed in "show" command output, but not
necessarily the same which were specified with "set" or "add" command.
-i/--iec switch force output in IEC units. JSON output always print
values as bytes per sec.
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Implement user commands to manage devlink port func rate objects.
List all rate commands:
$ devlink port func rate help
or just
$ devlink port func rate
To list all OR particular rate object:
$ devlink port func rate show
pci/0000:03:00.0/some_group: type node
pci/0000:03:00.0/0: type leaf
pci/0000:03:00.0/1: type leaf
$ devlink prot func rate show pci/0000:03:00.0/1
pci/0000:03:00.0/0: type leaf
$ devlink prot func rate show pci/0000:03:00.0/some_group
pci/0000:03:00.0/some_group: type node
Rate object of type "leaf" created by it's driver where name is the name
of corresponding devlink port. Rate object of type "node" represents
rate group created by the user using commands:
$ devlink port func rate add pci/0000:03:00.0/some_group
or with defining tx rate limits
$ devlink port func rate add pci/0000:03:00.0/some_group \
tx_shara 10kbit tx_max 100mbit
NOTE: node name cannot be a decimal value because it conflicts with
devlink port indexes.
To delete node object:
$ devlink port func rate del pci/0000:03:00.0/some_group
Set rate limits of existing rate object:
$ devlink prot func rate set pci/0000:03:00.0/0 \
tx_share 5MBps tx_max 25GBps
$ devlink prot func rate set pci/0000:03:00.0/some_group \
tx_share 0
Both SET and ADD commands accept any units of rates defined in IEC
60027-2 standard.
NOTE: rate value 0 means that rate is unlimited. Such value is also
ommited in show command output.
NOTE: In SHOW command output rate values will be printed with suffixes
as well, but in JSON output they are always units of Bps.
Set or unset parent of existing rate object:
$ devlink prot func rate set pci/0000:03:00.0/0 parent some_group
$ devlink port func rate set pci/0000:03:00.0/0 noparent
NOTE: Setting parent to empty ("") name due to kernel logic means unset
parent and shouldn't be used to avoid unexpected parent unsets.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Every handler argument validated in two steps, first of which, form
checking, expects identifier is few words separated by slashes.
For device and region handlers just checked if identifier have expected
number of slashes.
Add generic function to do that and make code cleaner & consistent.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
A user optionally provides the external controller number when user
wants to create devlink port for the external controller.
An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
function:
hw_addr 00:00:00:00:00:00
$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
There are more and more global environment variables that land everywhere
in configure, which is making user hard to know which one does what.
Using command-line options would make it easier for users to learn or
remember the config options.
This patch converts the INCLUDE variable to command option first. Check
if the first variable has '-' to compile with the old INCLUDE path
setting method.
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
'-b' option allows to request BPF filter opcodes, however
currently the kernel returns only classic BPF filter, so
reflect this in man page.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for matching on ct_state flag related.
The related state indicates a packet is associated with an existing
connection.
Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
ct_state -est-rel+trk \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
ct_state +rel+trk \
action mirred egress redirect dev ens1f0_1
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
genl_add_mcast_grp doesn't set errno in all cases.
On kernels that support mptcp but lack event support (all kernels <= 5.11)
MPTCP_PM_EV_GRP_NAME won't be found and ip will exit with
"can't subscribe to mptcp events: Success"
Set errno to a meaningful value (ENOENT) when the group name isn't found
and also cover other spots where it returns nonzero with errno unset.
Fixes: ff619e4fd3 ("mptcp: add support for event monitoring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
With commit d5e6ee0dac the usage of functions name_to_handle_at() and
open_by_handle_at() are introduced. But these function are not available
e.g. in uclibc-ng < 1.0.35. To have a backward compatibility check for the
availability in the configure script and in case of absence do a direct
syscall.
Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
config.mk needs to be re-generated any time configure is changed.
Rename the existing make target and add a check that the config.mk
file needs to exist and must be newer than configure script.
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
We introduce the "count" optional attribute for supporting counters in SRv6
Behaviors as defined in [1], section 6. For each SRv6 Behavior instance,
counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, we introduce a new counter that counts the number of packets
that have NOT been properly processed (i.e. errors) by an SRv6 Behavior
instance.
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters specifing the "count" attribute as follows:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
v2:
- add help and route.8 man page updates
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"
The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.
This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"
Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
tipc segfaults when called with an abnormally long key:
$ tipc node set key 0123456789abcdef0123456789abcdef0123456789abcdef
*** buffer overflow detected ***: terminated
Fix this returning an error if key length is longer than
TIPC_AEAD_KEYLEN_MAX.
Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
tipc segfaults when called with an abnormally long algname:
$ tipc node set key 0x1234 algname supercalifragilistichespiralidososupercalifragilistichespiralidoso
*** buffer overflow detected ***: terminated
Fix this returning an error if provided algname is longer than
TIPC_AEAD_ALG_NAME.
Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
When receiving a result from first query to netlink, we may exec
a another query inside the callback. If calling this sub-routine
in the same socket, it will be discarded the result from previous
exection.
To avoid this we perform a nested query in separate socket.
Fixes: 2021028306 ("tipc: use the libmnl functions in lib/mnl_utils.c")
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Linux kernel commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB
support to CAKE diffserv handling") added packets with LE diffserv to
the Bulk priority tin. Update the documentation to reflect this change.
Signed-off-by: Tyson Moore <tyson@tyson.me>
Signed-off-by: David Ahern <dsahern@kernel.org>
main() dinamically allocates dcb, but when dcb_help() is called it
returns without freeing it.
Fix this using a goto, as it is already done in the same function.
Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
dcb_cmd_app_show() is supposed to return EINVAL if an incorrect argument
is provided.
Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
In function bpf_obj_open, if bpf_fetch_prog_arg() return an error, we
end up in the out: path with a negative value for fd, and pass it to
close.
Avoid this checking for fd to be positive.
Fixes: 32e93fb7f6 ("{f,m}_bpf: allow for sharing maps")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
specified and nbands is 0, terminating execution.
Thus nbands cannot be < 1 at this point and this code cannot be executed.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Gal Pressman says:
====================
This is the userspace part for the new copy-on-fork attribute added to
the get sys netlink command.
The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.
Kernel series was merged:
https://lore.kernel.org/linux-rdma/20210418121025.66849-1-galpress@amazon.com/
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.
If the attribute is not reported (expected on older kernels),
copy-on-fork is disabled.
Example:
$ rdma sys
netns shared copy-on-fork on
Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
When add address with port, it is mean to send an ADD_ADDR to remote,
so it must have flag signal set.
Fixes: 42fbca91cd ("mptcp: add support for port based endpoint")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.
This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.
Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David Ahern <dsahern@kernel.org>
Leon Romanovsky says:
====================
This is the user space part of already accepted to the kernel series
that extends RDMA netlink interface to return uverbs context and SRQ
information.
The accepted kernel series can be seen here:
https://lore.kernel.org/linux-rdma/20210422133459.GA2390260@nvidia.com/
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Sample output:
$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon
$ rdma res show srq lqpn 126-141
dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon
$ rdma res show srq lqpn 127
dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon
Reviewed-by: Ido Kalir <idok@nvidia.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Sample output:
$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong
$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong
Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Update kernel headers to commit:
99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues")
Signed-off-by: David Ahern <dsahern@kernel.org>
When I added support for new vlan rtm dumping, I made a mistake in the
output format when there are no vlans on the port. This patch fixes it by
not printing ports without vlan entries (similar to current situation).
Example (no vlans):
$ bridge -d vlan show
port vlan-id
Fixes: e5f87c8341 ("bridge: vlan: add support for the new rtm dump call")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The 'ip' utility hardcodes the assumption of being a 2-char command, where
any follow-on characters are passed as an argument:
$ ./ip-full help
Object "-full" is unknown, try "ip help".
This confusing behaviour isn't seen with 'tc' for example, and was added in
a 2005 commit without documentation. It was noticed during testing of 'ip'
variants built/packaged with different feature sets (e.g. w/o BPF support).
Mitigate the problem by redoing the command without the 2-char assumption
if the follow-on characters fail to parse as a valid command.
Fixes: 351efcde4e ("Update header files to 2.6.14")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The function get_task_name() is used to get the name of a process from
its pid, and its implementation is similar to ip/iptuntap.c:pid_name().
Move it to lib/fs.c to use a single implementation and make it easily
reusable.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Nikolay Aleksandrov says:
====================
From: Nikolay Aleksandrov <nikolay@nvidia.com>
This set extends the bridge vlan code to use the new vlan RTM calls
which allow to dump detailed per-port, per-vlan information and also to
manipulate the per-vlan options. It also allows to monitor any vlan
changes (add/del/option change). The rtm vlan dumps have an extensible
format which allows us to add new options and attributes easily, and
also to request the kernel to filter on different vlan information when
dumping. The new kernel dump code tries to use compressed vlan format as
much as possible (it includes netlink attributes for vlan start and
end) to reduce the number of generated messages and netlink traffic.
The iproute2 support is activated by using the "-d" flag when showing
vlan information, that will cause it to use the new rtm dump call and
get all the detailed information, if "-s" is also specified it will dump
per-vlan statistics as well. Obviously in that case the vlans cannot be
compressed. To change per-vlan options (currently only STP state is
supported) a new vlan command is added - "set". It can be used to set
options of bridge or port vlans and vlan ranges can be used, all of the
new vlan option code uses extack to show more understandable errors.
The set adds the first supported per-vlan option - STP state.
Man pages and usage information are updated accordingly.
Example:
$ bridge -d vlan show
port vlan-id
ens13 1 PVID Egress Untagged
state forwarding
bridge 1 PVID Egress Untagged
state forwarding
$ bridge vlan set vid 1 dev ens13 state blocking
$ bridge -d vlan show
port vlan-id
ens13 1 PVID Egress Untagged
state blocking
bridge 1 PVID Egress Untagged
state forwarding
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for vlan activity monitoring, we display vlan notifications on
vlan add/del/options change. The man page and help are also updated
accordingly.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Use the new bridge vlan rtm dump helper to dump all of the available
vlan information when -details (-d) is used with vlan show. It is also
capable of dumping vlan stats if -statistics (-s) is added.
Currently this is the only interface capable of dumping per-vlan
options. The vlan dump format is compatible with current vlan show, it
uses the same helpers to dump vlan information. The new addition is one
line which will contain the per-vlan options (similar to ip -d link show
for ports). Currently only the vlan STP state is printed.
The call uses compressed vlan format by default.
Example:
$ bridge -s -d vlan show
port vlan-id
virbr1 1 PVID Egress Untagged
state forwarding
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add rtnl bridge vlan dump request helper which will be used to retrieve
bridge vlan information and options.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a new per-vlan option set command. It allows to manipulate vlan
options, those can be bridge-wide or per-port depending on what device
is specified. The first option that can be set is the vlan STP state,
it is identical to the bridge port STP state. The man page is also
updated accordingly.
Example:
$ bridge vlan set vid 10 dev br0 state learning
or a range:
$ bridge vlan set vid 10-20 dev swp1 state blocking
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a helper which parses an STP state string to its numeric value.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Rename print_portstate to print_stp_state in preparation for use by vlan
code as well (per-vlan state), and export it. To be in line with the new
naming rename also port_states to stp_states as they'll be used for
vlans, too.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This adds iproute2 support for mptcp event monitoring, e.g. creation,
establishment, address announcements from the peer, subflow establishment
and so on.
While the kernel-generated events are primarily aimed at mptcpd (e.g. for
subflow management), this is also useful for debugging.
This adds print support for the existing events.
Sample output of 'ip mptcp monitor':
[ CREATED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[ ESTABLISHED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[SF_ESTABLISHED] token=83f3a692 remid=0 locid=1 saddr4=10.0.2.2 daddr4=10.0.1.1 sport=40195 dport=10011 backup=0
[ CLOSED] token=83f3a692
Signed-off-by: Florian Westphal <fw@strlen.de>
since id is unique for nexthop, it is heavy to dump all nexthops.
use existing delete_nexthop to support flush by id
Signed-off-by: Chunmei Xu <xuchunmei@linux.alibaba.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
To avoid code duplication, tipc should be converted to use the helper
functions for working with libmnl in lib/mnl_utils.c
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@kernel.org>
Allow a policer action to enforce a rate-limit based on packets-per-second,
configurable using a packet-per-second rate and burst parameters.
e.g.
# $TC actions add action police pkts_rate 1000 pkts_burst 200 index 1
# $TC actions ls action police
total acts 1
action order 0: police 0x1 rate 0bit burst 0b mtu 4096Mb pkts_rate 1000 pkts_burst 200
ref 1 bind 0
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
- Open Routing is using ID 99 for it's installed routes
- https://github.com/facebook/openr
- Kernel has accepted 99 in `rtnetlink.h`
Signed-of-by: Cooper Lees <me@cooperlees.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch adds support for setting and displaying the Traffic Flow
Confidentiality attribute for an XFRM state, which allows padding ESP
packets to a specified length.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata says:
====================
Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.
In this patch set, introduce support for resilient next-hop groups to
iproute2.
- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.
- Patches #2 and #3 add new helpers that will be useful later.
- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
command line argument, and to dispatch based on the specified type.
- Patch #5 adds the support for resilient next-hop groups.
- Patch #6 adds the support for resilient next-hop group bucket interface.
To illustrate the usage, consider the following commands:
# ip nexthop add id 1 via 192.0.2.2 dev dummy1
# ip nexthop add id 2 via 192.0.2.3 dev dummy1
# ip nexthop add id 10 group 1/2 type resilient \
buckets 8 idle_timer 60 unbalanced_timer 300
The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.
And this is how the next-hop group bucket interface looks:
# ip nexthop bucket show id 10
id 10 index 0 idle_time 5.59 nhid 1
id 10 index 1 idle_time 5.59 nhid 1
id 10 index 2 idle_time 8.74 nhid 2
id 10 index 3 idle_time 8.74 nhid 2
id 10 index 4 idle_time 8.74 nhid 1
id 10 index 5 idle_time 8.74 nhid 1
id 10 index 6 idle_time 8.74 nhid 1
id 10 index 7 idle_time 8.74 nhid 1
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Add ability to dump multiple nexthop buckets and get a specific one.
Example:
# ip nexthop add id 10 group 1/2 type resilient buckets 8
# ip nexthop
id 1 via 192.0.2.2 dev dummy10 scope link
id 2 via 192.0.2.19 dev dummy20 scope link
id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
# ip nexthop bucket
id 10 index 0 idle_time 28.1 nhid 2
id 10 index 1 idle_time 28.1 nhid 2
id 10 index 2 idle_time 28.1 nhid 2
id 10 index 3 idle_time 28.1 nhid 2
id 10 index 4 idle_time 28.1 nhid 1
id 10 index 5 idle_time 28.1 nhid 1
id 10 index 6 idle_time 28.1 nhid 1
id 10 index 7 idle_time 28.1 nhid 1
# ip nexthop bucket show nhid 1
id 10 index 4 idle_time 53.59 nhid 1
id 10 index 5 idle_time 53.59 nhid 1
id 10 index 6 idle_time 53.59 nhid 1
id 10 index 7 idle_time 53.59 nhid 1
# ip nexthop bucket get id 10 index 5
id 10 index 5 idle_time 81 nhid 1
# ip -j -p nexthop bucket get id 10 index 5
[ {
"id": 10,
"bucket": {
"index": 5,
"idle_time": 104.89,
"nhid": 1
},
"flags": [ ]
} ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add ability to configure resilient nexthop groups and show their current
configuration. Example:
# ip nexthop add id 10 group 1/2 type resilient buckets 8
# ip nexthop show id 10
id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0
# ip -j -p nexthop show id 10
[ {
"id": 10,
"group": [ {
"id": 1
},{
"id": 2
} ],
"type": "resilient",
"resilient_args": {
"buckets": 8,
"idle_timer": 120,
"unbalanced_timer": 0
},
"flags": [ ]
} ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.
These two commands are equivalent:
# ip nexthop add id 10 group 1/2/3
# ip nexthop add id 10 group 1/2/3 type mpath
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a helper to dump a timeval. Print by first converting to double and
then dispatching to print_color_float().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
When user specifies either unknown flavour or unknown state during
devlink port commands, return appropriate error message.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Introduce helper for generic socket receive helper and introduce helper
to build command with custom family and version.
Use API in subsequent devlink patch.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
User helper routines provided by library for counting slash and
splitting string on delimiter.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The feature is supported by the kernel since 5.11-net-next,
let's allow user-space to use it.
Just parse and dump an additional, per endpoint, u16 attribute
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>