Compare commits

...

628 Commits
v5.7.0 ... main

Author SHA1 Message Date
Paul Blakey 73590d9573 tc: flower: Fix buffer overflow on large labels
Buffer is 64bytes, but label printing can take 66bytes printing
in hex, and will overflow when setting the string delimiter ('\0').

Fix that by increasing the print buffer size.

Example of overflowing ct_label:
ct_label 11111111111111111111111111111111/11111111111111111111111111111111

Fixes: 2fffb1c030 ("tc: flower: Add matching on conntrack info")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-06 13:44:50 -08:00
Stephen Hemminger 3f77bc6253 uapi: update to if_ether.h
Merged from 5.16-rc3

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-12-03 12:20:02 -08:00
Maxim Petrov 5f8bb902e1 ip/ipnexthop: fix unsigned overflow in parse_nh_group_type_res()
0UL has type 'unsigned long' which is likely to be 64bit on modern machines. At
the same time, the '{idle,unbalanced}_timer' variables are declared as u32, so
these variables cannot be greater than '~0UL / 100' when 'unsigned long' is 64
bits. In such condition it is still possible to pass the check but get the
overflow later when the timers are multiplied by 100 in 'addattr32'.

Fix the possible overflow by changing '~0UL' to 'UINT32_MAX'.

Fixes: 9167671822 ("nexthop: Add support for resilient nexthop groups")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:48 -08:00
Maxim Petrov 3184de3797 lib/bpf_legacy: remove always-true check
The 'name' field of the 'struct bpf_prog_info' is a plain C array. Thus, the
logical condition in bpf_dump_prog_info() is useless as the array address is
always true, so just remove it.

Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 15:01:04 -08:00
Stephen Hemminger 79026c1262 rdma: update uapi headers
Update the RDMA uapi headers from 5.16.0-rc1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 10:00:19 -08:00
Stephen Hemminger fa58de9b0c vdpa: align uapi headers
Update vdpa headers based on 5.16.0-rc1 and remove redundant
copy.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-18 09:56:57 -08:00
[200~jiangheng be31c26484 lnstat: fix buffer overflow in header output
Running lnstat will cause core dump from reading past end of array.

Segmentation fault (core dumped)

The maximum  value of th.num_lines is HDR_LINES(10),  h should not be equal to th.num_lines, array th.hdr may be out of bounds.

Signed-off-by jiangheng <jiangheng12@huawei.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 13:41:10 -08:00
Maxim Petrov 0e94972590 tc/m_vlan: fix print_vlan() conditional on TCA_VLAN_ACT_PUSH_ETH
Fix the wild bracket in the if clause leading to the error in the condition.

Fixes: d61167dd88 ("m_vlan: add pop_eth and push_eth actions")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-17 11:13:12 -08:00
Davide Caratti 9bd5ab0f09 mptcp: fix JSON output when dumping endpoints by id
iproute ignores '-j' command line argument when dumping endpoints by id:

 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show
 [{"address":"1.2.3.4","id":42,"signal":true,"backup":true}]
 [dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show id 42
 1.2.3.4 id 42 signal backup

fix mptcp_addr_show() to use the proper JSON helpers.

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-11 10:07:26 -08:00
Anssi Hannula a787d9ae10 man: tc-u32: Fix page to match new firstfrag behavior
Commit 690b11f4a6 ("tc: u32: Fix firstfrag filter.") applied in 2012
changed the "ip firstfrag" selector to not match non-fragmented packets
anymore.

However, the documentation added in f15a23966f ("tc: add a man page
for u32 filter") in 2015 includes an example that relies on the previous
behavior (non-fragmented packet counted as first fragment).

Due to this, the example does not work correctly and does not actually
classify regular SSH packets.

Modify the example to use a raw u16 selector on the fragment offset to
make it work, and also make the firstfrag description more clear about
the current behavior.

Fixes: f15a23966f ("tc: add a man page for u32 filter")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:46:17 -08:00
Luca Boccassi af96c7b5dd Fix some typos detected by Lintian in manpages
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:45:44 -08:00
Stephen Hemminger 35c81b18c4 uapi: update vdpa.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-09 10:40:40 -08:00
David Ahern 50b668bdbf Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:45:31 -06:00
David Ahern 9c56d693f6 Merge branch 'can-tdc-plus-cleanups' into next
Vincent Mailhol  says:

====================

The main purpose is to add commandline support for Transmitter Delay
Compensation (TDC) in iproute. Other issues found during the
development of this feature also get addressed.

This patch series contains four patches which respectively:

  1. Correct the bittiming ranges in the print_usage function and add
  the units to give more clarity: some parameters are in milliseconds,
  some in nano seconds, some in time quantum and the newly TDC
  parameters introduced in this series would be in clock period.

  2. Do some code refactoring on function print_ctrlmode().

  3. factorize the many print_*(PRINT_JSON, ...) and fprintf
  occurrences in a single print_*(PRINT_ANY, ...) call and fix the
  signedness while doing that.

  4. report the value of the bitrate prescalers (brp and dbrp).

  5. adds command line support for the TDC in iproute and goes together
  with below series in the kernel:
  https://lore.kernel.org/linux-can/20210814091750.73931-1-mailhol.vincent@wanadoo.fr/T/#t

** Changelog **

>From RFC v5 to v6:
  * Dropped the RFC tag because the related patch series on the kernel
    side were pulled into net-next.
  * Remove the changes in include/uapi/linux/can/netlink.h because
    these should be pulled separately.
  * Add another patch (the second of this series) to do some cleanup
    on function print_ctrlmode().
  * Minor fixes in the patch comments (grammar, rephrasing).

>From RFC v4 to RFC v5:
  * Add the unit (bps, tq, ns or ms) in print_usage()
  * Rewrote void can_print_timing_min_max() to better factorize the
    code.
  * Rewrote the commit message of the two last patches (those related
    to TDC) to either add clarification of fix inacurracies.

>From v3 to RFC v4:
  * Reflect the changes made on the kernel side.

>From RFC v2 to v3:
  * Dropped the RFC tag. Now that the kernel patch reach the testing
    branch, I am finaly ready.
  * Regression fix: configuring a link with only nominal bittiming
    returned -EOPNOTSUPP
  * Added two more patches to the series:
      - iplink_can: fix configuration ranges in print_usage()
      - iplink_can: print brp and dbrp bittiming variables
  * Other small fixes on formatting.

>From RFC v1 to RFC v2:
  * Add an additional patch to the series to fix the issues reported
    by Stephen Hemminger
    Ref: https://lore.kernel.org/linux-can/20210506112007.1666738-1-mailhol.vincent@wanadoo.fr/T/#t

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:44:56 -06:00
Vincent Mailhol 0c263d7c36 iplink_can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.

This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).

This patch brings command line support to nine TDC parameters which
were recently added to the kernel's CAN netlink interface in order to
implement TDC:
  - IFLA_CAN_TDC_TDCV_MIN: Transmitter Delay Compensation Value
    minimum value
  - IFLA_CAN_TDC_TDCV_MAX: Transmitter Delay Compensation Value
    maximum value
  - IFLA_CAN_TDC_TDCO_MIN: Transmitter Delay Compensation Offset
    minimum value
  - IFLA_CAN_TDC_TDCO_MAX: Transmitter Delay Compensation Offset
    maximum value
  - IFLA_CAN_TDC_TDCF_MIN: Transmitter Delay Compensation Filter
    window minimum value
  - IFLA_CAN_TDC_TDCF_MAX: Transmitter Delay Compensation Filter
    window maximum value
  - IFLA_CAN_TDC_TDCV: Transmitter Delay Compensation Value
  - IFLA_CAN_TDC_TDCO: Transmitter Delay Compensation Offset
  - IFLA_CAN_TDC_TDCF: Transmitter Delay Compensation Filter window

All those new parameters are nested together into the attribute
IFLA_CAN_TDC.

The TDC parameters extend the FD parameters. As such, the TDC
parameters must be specified together the "fd on" flag.

When "fd on" flag is provided, a tdc-mode parameter allows to specify
how to operate.  Valid options for tdc-mode are:

  * auto: the transmitter dynamically measures TDCV for each of the
    transmitted frames. As such, TDCV can not be manually provided. In
    this mode, the user must specify TDCO and may also specify TDCF if
    supported.

  * manual: use a static TDCV provided by the user. In this mode, the
    user must specify both TDCV and TDCO and may also specify TDCF if
    supported.

  * off: TDC is explicitly disabled.

  * tdc-mode parameter omitted (default mode): the kernel decides
    whether TDC should be enabled or not and if so, it calculates the
    TDC values. TDC parameters are an expert option and the average
    user is not expected to provide those, thus the presence of this
    "default mode".

If the fd flag is omitted, all the FD values (including TDC values)
remain unchanged.

If "fd off" flag is specified, all FD values (including TDC values)
are zeroed.

TDCV is always reported in manual mode. In auto mode, TDCV is reported
only if the value is available. Especially, the TDCV might not be
available if the controller has no feature to report it or if the
value in not yet available (i.e. no data sent yet and measurement did
not occur).

TDCF is reported only if tdcf_max is not zero (i.e. if supported by
the controller).

For reference, here are a few samples of how the output looks like:

| $ ip link set can0 type can bitrate 1000000 dbitrate 8000000 fd on tdco 7 tdcf 8 tdc-mode auto

| $ ip --details link show can0
| 1:  can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD,TDC-AUTO> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
| 	  bitrate 1000000 sample-point 0.750
| 	  tq 12 prop-seg 29 phase-seg1 30 phase-seg2 20 sjw 1 brp 1
| 	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp_inc 1
| 	  dbitrate 8000000 dsample-point 0.700
| 	  dtq 12 dprop-seg 3 dphase-seg1 3 dphase-seg2 3 dsjw 1 dbrp 1
| 	  tdco 7 tdcf 8
| 	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp_inc 1
| 	  tdco 0..127 tdcf 0..127
| 	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

| $ ip --details --json --pretty link show can0
| [ {
|         "ifindex": 1,
|         "ifname": "can0",
|         "flags": [ "NOARP","ECHO" ],
|         "mtu": 72,
|         "qdisc": "noop",
|         "operstate": "DOWN",
|         "linkmode": "DEFAULT",
|         "group": "default",
|         "txqlen": 10,
|         "link_type": "can",
|         "promiscuity": 0,
|         "min_mtu": 0,
|         "max_mtu": 0,
|         "linkinfo": {
|             "info_kind": "can",
|             "info_data": {
|                 "ctrlmode": [ "FD","TDC-AUTO" ],
|                 "state": "STOPPED",
|                 "berr_counter": {
|                     "tx": 0,
|                     "rx": 0
|                 },
|                 "restart_ms": 0,
|                 "bittiming": {
|                     "bitrate": 1000000,
|                     "sample_point": "0.750",
|                     "tq": 12,
|                     "prop_seg": 29,
|                     "phase_seg1": 30,
|                     "phase_seg2": 20,
|                     "sjw": 1,
|                     "brp": 1
|                 },
|                 "bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 256
|                     },
|                     "tseg2": {
|                         "min": 2,
|                         "max": 128
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 128
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 512
|                     },
|                     "brp_inc": 1
|                 },
|                 "data_bittiming": {
|                     "bitrate": 8000000,
|                     "sample_point": "0.700",
|                     "tq": 12,
|                     "prop_seg": 3,
|                     "phase_seg1": 3,
|                     "phase_seg2": 3,
|                     "sjw": 1,
|                     "brp": 1,
|                     "tdc": {
|                         "tdco": 7,
|                         "tdcf": 8
|                     }
|                 },
|                 "data_bittiming_const": {
|                     "name": "ES582.1/ES584.1",
|                     "tseg1": {
|                         "min": 2,
|                         "max": 32
|                     },
|                     "tseg2": {
|                         "min": 1,
|                         "max": 16
|                     },
|                     "sjw": {
|                         "min": 1,
|                         "max": 8
|                     },
|                     "brp": {
|                         "min": 1,
|                         "max": 32
|                     },
|                     "brp_inc": 1,
|                     "tdc": {
|                         "tdco": {
|                             "min": 0,
|                             "max": 127
|                         },
|                         "tdcf": {
|                             "min": 0,
|                             "max": 127
|                         }
|                     }
|                 },
|                 "clock": 80000000
|             }
|         },
|         "num_tx_queues": 1,
|         "num_rx_queues": 1,
|         "gso_max_size": 65536,
|         "gso_max_segs": 65535
|     } ]

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:43:10 -06:00
Vincent Mailhol 0f7bb8d842 iplink_can: print brp and dbrp bittiming variables
Report the value of the bit-rate prescaler (brp) for both the nominal
and the data bittiming.

Currently, only the constant brp values (brp_{min,max,inc}) are being
reported. Also, brp is the only member of struct can_bittiming not
being reported.

Noticeably, brp could be calculated by hand from the other bittiming
parameters with below formula:

        brp = clock * tq / 1000000000

with clock in hertz and tq in nano second (thus the need of a 1
billion factor to convert it back to second).

But because above formula is not so trivial to remember and is
subjected to rounding errors, it makes sense to directly output
{d,}bpr.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:54 -06:00
Vincent Mailhol 67f3c7a5cc iplink_can: use PRINT_ANY to factorize code and fix signedness
Current implementation heavily relies on some "if (is_json_context())"
switches to decide the context and then does some print_*(PRINT_JSON,
...) when in json context and some fprintf(...) else.

Furthermore, current implementation uses either print_int() or the
conversion specifier %d to print unsigned integers.

This patch factorizes each pairs of print_*(PRINT_JSON, ...) and
fprintf() into a single print_*(PRINT_ANY, ...) call. While doing this
replacement, it uses proper unsigned function print_uint() as well as
the conversion specifier %u when the parameter is an unsigned integer.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:50 -06:00
Vincent Mailhol fd5e958c49 iplink_can: code refactoring of print_ctrlmode()
This patch only does cleanup and do not introduce any functional
changes.

We do some code refactoring of print_ctrlmode() in prevision of the
upcoming patch:

  - remove the first argument of print_ctrlmode(). It is a pointer to
    FILE and is never used.

  - add a new function argument: enum output_type t in order to
    specify the output type (i.e. PRINT_{FP,JSON,ANY}).

  - add a new function argument: const char *key in order to specify
    the name of the json array (e.g. "ctrlmode").

  - replace the _PF() macro with the print_flag() function to increase
    readability.

  - directly return if none of the flags are set (previously, this
    check was done before calling the function).

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:44 -06:00
Vincent Mailhol 8316df6e6d iplink_can: fix configuration ranges in print_usage() and add unit
The configuration ranges in print_usage() are taken from "Table 8 -
Time segments' minimum configuration ranges" in section 11.3.1.2
"Configuration of the bit time parameters" of ISO 11898-1.

The standard clearly specifies that "implementations may allow time
segments that exceed the minimum required configuration ranges
specified in Table 8".

Because no maximum ranges are given in the standard, all given ranges
{ a..b } are simply replaced with { NUMBER }.

The actual ranges are specific to each device and can be confirmed
doing:

$ ip --details link show can0
1: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state STOPPED restart-ms 0
	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Finally, the unit (bps, tq, ns or ms) are given. The rationale to add
the units is that the TDC parameters (that will be introduced in the
upcoming patches) are measured in a different unit than the other
bittiming parameters: clock period (a.k.a. minimum time quantum)
instead of time quantum. Adding the units disambiguates things.

For reference, before the change:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { 1..1000000 }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER }
		  PROP-SEG	:= { 1..8 }
		  PHASE-SEG1	:= { 1..8 }
		  PHASE-SEG2	:= { 1..8 }
		  SJW		:= { 1..4 }
		  RESTART-MS	:= { 0 | NUMBER }

...and after it:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
	[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
	[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
 	  phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

	[ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |
	[ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
 	  dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

	[ loopback { on | off } ]
	[ listen-only { on | off } ]
	[ triple-sampling { on | off } ]
	[ one-shot { on | off } ]
	[ berr-reporting { on | off } ]
	[ fd { on | off } ]
	[ fd-non-iso { on | off } ]
	[ presume-ack { on | off } ]
	[ cc-len8-dlc { on | off } ]

	[ restart-ms TIME-MS ]
	[ restart ]

	[ termination { 0..65535 } ]

	Where: BITRATE	:= { NUMBER in bps }
		  SAMPLE-POINT	:= { 0.000..0.999 }
		  TQ		:= { NUMBER in ns }
		  PROP-SEG	:= { NUMBER in tq }
		  PHASE-SEG1	:= { NUMBER in tq }
		  PHASE-SEG2	:= { NUMBER in tq }
		  SJW		:= { NUMBER in tq }
		  RESTART-MS	:= { 0 | NUMBER in ms }

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-04 09:42:23 -06:00
Taehee Yoo 6e15d27aae ip: add AMT support
Add basic support for Automatic Multicast Tunneling (AMT) network devices.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
2021-11-03 13:24:13 -06:00
David Ahern 9cae1de564 Import amt.h
Impor amt.h uapi from last kernel sync point

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:23:38 -06:00
David Ahern 258e350ca9 Update kernel headers
Update kernel headers to commit:
    cc0356d6a02e ("Merge tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-11-03 13:22:15 -06:00
Moshe Shemesh 047e9ae516 devlink: Fix cmd_dev_param_set() to check configuration mode
This patch is fixing a bug, when param set user command includes
configuration mode which is not supported, the tool may not respond
with error if the requested value is 0. In such case
cmd_dev_param_set_cb() won't find the requested configuration mode and
returns ctx->value as initialized (equal 0). Then cmd_dev_param_set()
may find that requested value equals current value and returns success.

Fixing the bug by adding a flag cmode_found which is set only if
cmd_dev_param_set_cb() finds the requested configuration mode.

Fixes: 13925ae9eb ("devlink: Add param command support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-11-02 08:34:33 -07:00
Stephen Hemminger 7a8b7573a4 v5.15.0 2021-11-01 16:41:02 -07:00
Neta Ostrovsky ad3a118f88 rdma: Fix SRQ resource tracking information json
Fix the json output for the QPs that are associated with the SRQ -
The qpn are now displayed in a json array.

Sample output before the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128,130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Sample output after the fix:
$ rdma res show srq lqpn 126-141 -j -p
[ {
        "ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":4,
	"type":"BASIC",
	"lqpn":["126-128","130-140"],
	"pdn":9,
	"pid":3581,
	"comm":"ibv_srq_pingpon"
    },{
	"ifindex":0,
	"ifname":"ibp8s0f0",
	"srqn":5,
	"type":"BASIC",
	"lqpn":["141"],
	"pdn":10,
	"pid":3584,
	"comm":"ibv_srq_pingpon"
    } ]

Fixes: 9b272e138d ("rdma: Add SRQ resource tracking information")
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:04:45 -07:00
Antoine Tenart 7a235a101b man: devlink-port: fix pfnum for devlink port add
When configuring a devlink PCI port, the pfnumber can be specified
using 'pfnum' and not 'pcipf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-29 15:03:44 -07:00
David Ahern e2947f6fd8 Merge branch 'managed-neighbor' into next
Daniel Borkmann  says:

====================

iproute2 patches to add support for managed neighbor entries as per recent
net-next commits:

  2ed08b5ead3c ("Merge branch 'Managed-Neighbor-Entries'")
  c47fedba94bc ("Merge branch 'minor-managed-neighbor-follow-ups'")

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 09:00:26 -06:00
Daniel Borkmann 9e009e78e7 ip, neigh: Add NTF_EXT_MANAGED support
Currently, ip neigh does not support the NTF_EXT_MANAGED flag. Add cmdline
support.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:59:03 -06:00
Daniel Borkmann 040e52526c ip, neigh: Add missing NTF_USE support
Currently, ip neigh does not support the NTF_USE flag. Similar to other flags
such as extern_learn, add cmdline support. The flag dump support is explicitly
missing here, since the kernel does not propagate the flag back to user space.

Usage example:

  # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
  # ./ip/ip n
  192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:55 -06:00
Daniel Borkmann c76a3849ec ip, neigh: Fix up spacing in netlink dump
Fix up spacing to consistently add a single ' ' after an attribute has
been printed. Currently, it is a bit of a mix of before and after which
can lead to double spacing to be printed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:50 -06:00
Nicolas Dichtel 76b30805f9 xfrm: enable to manage default policies
Two new commands to manage default policies:
 - ip xfrm policy setdefault
 - ip xfrm policy getdefault

And the corresponding part in 'ip xfrm monitor'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-28 08:58:28 -06:00
David Ahern 2be7d99960 Merge branch 'rdma-optional-stats' into next
Mark Zhang  says:

====================

This is supplementary part of kernel series [1], which provides an
extension to the rdma statistics tool that allows to set or list
optional counters dynamically, using netlink.

Thanks

[1] https://www.spinics.net/lists/linux-rdma/msg106283.html

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-16 12:52:02 -06:00
Stephen Hemminger 229eaba507 uapi: pickup fix for xfrm ABI breakage
See kernel
Commit 844f7eaaed9 ("include/uapi/linux/xfrm.h: Fix XFRM_MSG_MAPPING ABI breakage")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-15 17:40:30 -07:00
Nicolas Dichtel 95cd2a6204 iplink: enable to specify index when changing netns
When an interface is moved to another netns, it's possible to specify a
new ifindex. Let's add this support.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eeb85a14ee34
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 18:05:09 -06:00
David Ahern a936a73fc2 Merge branch 'config-libdir' into next
Andrea Claudi  says:

====================

This series add support for the libdir parameter in iproute2 configure
script. The idea is to make use of the fact that packaging systems may
assume that 'configure' comes from autotools allowing a syntax similar
to the autotools one, and using it to tell iproute2 where the distro
expects to find its lib files.

Patches 1-2 fix a parsing issue on current configure options, that may
trigger an endless loop when no value is provided with some options;

Patch 3 fixes a parsing issue bailing out when more than one value is
provided for a single option;

Patch 4 simplifies options parsing, moving semantic checks out of the
while loop processing options;

Patch 5 introduces support for the --opt=value style on current options,
for uniformity;

Patch 6 adds the --prefix option, that may be used by some packaging
systems when calling the configure script;

Patch 7 finally adds the --libdir option, and also drops the static
LIBDIR var from the Makefile.

Changelog:
----------
v4 -> v5
  - bail out when multiple values are provided with a single option
  - simplify option parsing and reduce code duplication, as suggested
    by Phil Sutter
  - remove a nasty eval on libdir option processing

v3 -> v4
  - fix parsing issue on '--include_dir' and '--libbpf_dir'
  - split '--opt value' and '--opt=value' use cases, avoid code
    duplication moving semantic checks on value to dedicated functions

v2 -> v3
  - fix parsing error on prefix and libdir options.

v1 -> v2
  - consolidate '--opt value' and '--opt=value' use cases, as suggested
    by David Ahern.
  - added patch 2 to manage the --prefix option, used by the Debian
    packaging system, as reported by Luca Boccassi, and use it when
    setting lib directory.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:59:33 -06:00
Andrea Claudi cee0cf84bd configure: add the --libdir option
This commit allows users/packagers to choose a lib directory to store
iproute2 lib files.

At the moment iproute2 ship lib files in /usr/lib and offers no way to
modify this setting. However, according to the FHS, distros may choose
"one or more variants of the /lib directory on systems which support
more than one binary format" (e.g. /usr/lib64 on Fedora).

As Luca states in commit a3272b9372 ("configure: restore backward
compatibility"), packaging systems may assume that 'configure' is from
autotools, and try to pass it some parameters.

Allowing the '--libdir=/path/to/libdir' syntax, we can use this to our
advantage, and let the lib directory to be chosen by the distro
packaging system.

Note that LIBDIR uses "\${prefix}/lib" as default value because autoconf
allows this to be expanded to the --prefix value at configure runtime.
"\${prefix}" is replaced with the PREFIX value in check_lib_dir().

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:20 -06:00
Andrea Claudi 0ee1950b5c configure: add the --prefix option
This commit add the '--prefix' option to the iproute2 configure script.

This mimics the '--prefix' option that autotools configure provides, and
will be used later to allow users or packagers to set the lib directory.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:17 -06:00
Andrea Claudi 4b8bca5f9e configure: support --param=value style
This commit makes it possible to specify values for configure params
using the common autotools configure syntax '--param=value'.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:05 -06:00
Andrea Claudi 99245d1741 configure: simplify options parsing
This commit simplifies options parsing moving all the code not related to
parsing out of the case statement.

- The conditional shift after the assignments is moved right after the
  case, reducing code duplication.
- The semantic checks on the LIBBPF_FORCE value is moved after the loop
  like we already did for INCLUDE and LIBBPF_DIR.
- Finally, the loop condition is changed to check remaining arguments, thus
  making it possible to get rid of the null string case break.

As a bonus, now the help message states that on or off should follow
--libbpf_force

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:57:02 -06:00
Andrea Claudi c330d09794 configure: fix parsing issue with more than one value per option
With commit a9c3d70d90 ("configure: add options ability") users are no
more able to provide wrong command lines like:

$ ./configure --include_dir foo bar

The script simply bails out when user provides more than one value for a
single option. However, in doing so, it breaks backward compatibility with
some packaging system, which expects unknown options to be ignored.

Commit a3272b9372 ("configure: restore backward compatibility") fix this
issue, but makes it possible again for users to provide wrong command lines
such as the one above.

This fixes the issue simply ignoring autoconf-like options such as
'--opt=value'.

Fixes: a3272b9372 ("configure: restore backward compatibility")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:57 -06:00
Andrea Claudi 48c379bc2a configure: fix parsing issue on libbpf_dir option
configure is stuck in an endless loop if '--libbpf_dir' option is used
without a value:

$ ./configure --libbpf_dir
./configure: line 515: shift: 2: shift count out of range
./configure: line 515: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the libbpf dir
exists; also, as LIBBPF_DIR does not have a default value, configure bails
out if the user does not specify a value after --libbpf_dir, thus avoiding
to produce an erroneous configuration.

Fixes: 7ae2585b86 ("configure: convert LIBBPF environment variables to command-line options")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:53 -06:00
Andrea Claudi 1d819dcc74 configure: fix parsing issue on include_dir option
configure is stuck in an endless loop if '--include_dir' option is used
without a value:

$ ./configure --include_dir
./configure: line 506: shift: 2: shift count out of range
./configure: line 506: shift: 2: shift count out of range
[...]

Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.

A check is also provided after the while loop to verify the include dir
exists; this avoid to produce an erroneous configuration.

Fixes: a9c3d70d90 ("configure: add options ability")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:56:48 -06:00
Neta Ostrovsky 19ba785f16 rdma: Add optional-counters set/unset support
This patch provides an extension to the rdma statistics tool
that allows to set/unset optional counters set dynamically,
using new netlink commands.
Note that the optional counter statistic implementation is
driver-specific and may impact the performance.

Examples:
To enable a set of optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts
To disable all optional counters on link rocep8s0f0/1:
    $ sudo rdma statistic unset link rocep8s0f0/1 optional-counters

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:57 -06:00
Neta Ostrovsky 7d5cb70e94 rdma: Add stat "mode" support
This patch introduces the "mode" command, which presents the enabled or
supported (when the "supported" argument is available) optional
counters.

An optional counter is a vendor-specific counter that may be
dynamically enabled/disabled. This enhancement of hwcounters allows
exposing of counters which are for example mutual exclusive and cannot
be enabled at the same time, counters that might degrades performance,
optional debug counters, etc.

Examples:
To present currently enabled optional counters on link rocep8s0f0/1:
    $ rdma statistic mode link rocep8s0f0/1
    link rocep8s0f0/1 optional-counters cc_rx_ce_pkts

To present supported optional counters on link rocep8s0f0/1:
    $ rdma statistic mode supported link rocep8s0f0/1
    link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:53 -06:00
Neta Ostrovsky d480cb71f5 rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit 7301d0a9834c
("RDMA/nldev: Add support to get status of all counters")

Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:52:47 -06:00
David Ahern e4ca6a4965 Update kernel headers
Update kernel headers to commit:
    295711fa8fec ("Merge branch 'dpaa2-irq-coalescing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:49:19 -06:00
Stephen Hemminger a31e7b7967 mptcp: cleanup include section.
David reported ipmptcp breaks hard the build when updating the
relevant kernel headers.

We should be more careful in the header section, explicitly
including all the required dependencies respecting the usual order
between systems and local headers.

Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-15 17:48:36 -06:00
Paul Chaignon a500c5ac87 lib/bpf: fix map-in-map creation without prepopulation
When creating map-in-maps, the outer map can be prepopulated using the
inner_idx field of inner maps. That field defines the index of the inner
map in the outer map. It is ignored if set to -1.

Commit 6d61a2b557 ("lib: add libbpf support") however started using
that field to identify inner maps. While iterating over all maps looking
for inner maps, maps with inner_idx set to -1 are erroneously skipped.
As a result, trying to create a map-in-map with prepopulation disabled
fails because the inner_id of the outer map is not correctly set.

This bug can be observed with strace -ebpf (notice the zero inner_map_fd
for the outer map creation):

    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=130996, max_entries=1, map_flags=0, inner_map_fd=0, map_name="maglev_inner", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = 32
    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=2, value_size=4, max_entries=65536, map_flags=BPF_F_NO_PREALLOC, inner_map_fd=0, map_name="maglev_outer", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0}, 128) = -1 EINVAL (Invalid argument)

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-14 14:37:51 -07:00
Antoine Tenart 7c032cac10 man: devlink-port: remove extra .br
br. were added between options of the same command. That is not needed
and makes the output to be one 3 lines for no particular reason.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 04ee8e6f06 man: devlink-port: fix style
Values should be .I, square brackets should be used for optional values,
curly brackets for lists. Follow this in the devlink-port man page.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
Antoine Tenart 14802d84d3 man: devlink-port: fix the devlink port add synopsis
When configuring a devlink PCI SF port, the sfnumber can be specified
using 'sfnum' and not 'pcisf' as stated in the man page. Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-11 19:27:12 -07:00
David Ahern 8cd517a805 Merge branch 'main' into next
Conflicts:
	ip/ipneigh.c

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:47:47 -06:00
David Ahern 763fd793fe Merge branch 'ioam-encap-modes' into next
Justin Iurman  says:

====================

Following the series applied to net-next (see [1]), here are the corresponding
changes to iproute2.

In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.

This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:

 - inline: directly inserts IOAM inside packets (by default).

 - encap:  ip6ip6 encapsulation of IOAM inside packets.

 - auto:   either inline mode for packets generated locally or encap mode for
           in-transit packets.

With current iproute2 implementation, it is configured this way:

$ ip -6 r [...] encap ioam6 trace prealloc [...]

The old syntax does not change (for backwards compatibility) and implicitly uses
the inline mode. With the new syntax, an encap mode can be specified:

(inline mode)
$ ip -6 r [...] encap ioam6 mode inline trace prealloc [...]

(encap mode)
$ ip -6 r [...] encap ioam6 mode encap tundst fc00::2 trace prealloc [...]

(auto mode)
$ ip -6 r [...] encap ioam6 mode auto tundst fc00::2 trace prealloc [...]

A tunnel destination address must be configured when using the encap mode or the
auto mode.

  [1] https://lore.kernel.org/netdev/163335001045.30570.12527451523558030753.git-patchwork-notify@kernel.org/T/#m3b428d4142ee3a414ec803466c211dfdec6e0c09

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:37:12 -06:00
Justin Iurman 41020eb0fd Update documentation
This patch updates the IOAM documentation (ip-route man page) to reflect the
three encap modes that were introduced.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:54 -06:00
Justin Iurman 8fb522cde3 Add support for IOAM encap modes
This patch adds support for the three IOAM encap modes that were introduced:
inline, encap and auto.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-09 17:35:29 -06:00
Frank Villaro-Dixon 897772a735 cmd: use spaces instead of tabs for usage indentation
Fix rogue "tab after spaces" used for indentation of the documentation.
This causes rendering issues on terminals using a non-standard tab width.

Signed-off-by: Frank Villaro-Dixon <frank.villaro@infomaniak.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-06 10:00:49 -07:00
Nikolay Aleksandrov b840c620fe ip: nexthop: keep cache netlink socket open
Since we use the cache netlink socket for each nexthop we can keep it open
instead of opening and closing it on every add call. The socket is opened
once, on the first add call and then reused for the rest.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:34:29 -06:00
Jacob Keller b90174354d devlink: print maximum number of snapshots if available
Recently the kernel gained ability to report the maximum number of
snapshots a region can have. Print this value out if it was reported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:31:30 -06:00
David Ahern 6448ed373c Update kernel headers
Update kernel headers to commit:
    49ed8dde3715 ("net: usb: use eth_hw_addr_set() for dev->addr_len cases")

Update to linux/mptcp.h is removed because it breaks compilation
of ipmptcp.c in a nontrivial way.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-05 08:28:28 -06:00
Davide Caratti e7a98a96f0 mptcp: unbreak JSON endpoint list
the following command:

 # ip -j mptcp endpoint show

prints a JSON array that misses the terminating bracket. Fix this calling
delete_json_obj() to balance the call to new_json_obj().

Fixes: 7e0767cd86 ("add support for mptcp netlink interface")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
2021-10-04 14:07:09 -07:00
David Ahern ec703e0629 Merge branch 'nexthop-cache' into next
Nikolay Aleksandrov  says:

====================

This set tries to help with an old ask that we've had for some time
which is to print nexthop information while monitoring or dumping routes.
The core problem is that people cannot follow nexthop changes while
monitoring route changes, by the time they check the nexthop it could be
deleted or updated to something else. In order to help them out I've
added a nexthop cache which is populated (only used if -d / show_details
is specified) while decoding routes and kept up to date while monitoring.
The nexthop information is printed on its own line starting with the
"nh_info" attribute and its embedded inside it if printing JSON. To
cache the nexthop entries I parse them into structures, in order to
reuse most of the code the print helpers have been altered so they rely
on prepared structures. Nexthops are now always parsed into a structure,
even if they won't be cached, that structure is later used to print the
nexthop and destroyed if not going to be cached. New nexthops (not found
in the cache) are retrieved from the kernel using a private netlink
socket so they don't disrupt an ongoing dump, similar to how interfaces
are retrieved and cached.

I have tested the set with the kernel forwarding selftests and also by
stressing it with nexthop create/update/delete in loops while monitoring.

Comments are very welcome as usual. :)

Changes since RFC:
 - reordered parse/print splits, in order to do that I have to parse
   resilient groups first, then add nh entry parsing so code has been
   reordered as well and patch order has changed, but there have been
   no functional changes (as before refactoring of old code is done in
   the first 8 patches and then patches 9-12 add the new cache and use it)
 - re-run all tests above

Patch breakdown:
Patches 1-2: update current route helpers to take parsed arguments so we
             can directly pass them from the nh_entry structure later
Patch     3: adds new nha_res_grp structure which describes a resilient
             nexhtop group
Patch     4: splits print_nh_res_group into a parse and print parts
             which use the new nha_res_grp structure
Patch     5: adds new nh_entry structure which describes a nexthop
Patch     6: factors out print_nexthop's attribute parsing into nh_entry
             structure used before printing
Patch     7: factors out print_nexthop's nh_entry structure printing
Patch     8: factors out ipnh_get's rtnl talk part and allows to use a
             different rt handle for the communication
Patch     9: adds nexthop cache and helpers to manage it, it uses the
             new __ipnh_get to retrieve nexthops
Patch    10: adds a new helper print_cache_nexthop_id that prints nexthop
             information from its id, if the nexthop is not found in the
             cache it fetches it
Patch    11: the new print_cache_nexthop_id helper is used when printing
             routes with show_details (-d) to output detailed nexthop
             information, the format after nh_info is the same as
             ip nexthop show
Patch    12: changes print_nexthop into print_cache_nexthop which always
             outputs the nexthop information and can also update the cache
             (based on process_cache argument), it's used to keep the
             cache up to date while monitoring

Example outputs (monitor):
[NEXTHOP]id 101 via 169.254.2.22 dev veth2 scope link proto unspec
[NEXTHOP]id 102 via 169.254.3.23 dev veth4 scope link proto unspec
[NEXTHOP]id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 192.0.2.0/24 nhid 203 table 4 proto boot scope global
	nh_info id 203 group 201/202 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via 169.254.2.12 dev veth3 weight 1
	nexthop via 169.254.3.13 dev veth5 weight 1

[NEXTHOP]id 204 via fe80:2::12 dev veth3 scope link proto unspec
[NEXTHOP]id 205 via fe80:3::13 dev veth5 scope link proto unspec
[NEXTHOP]id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 2001:db8:1::/64 nhid 206 table 4 proto boot scope global metric 1024 pref medium
	nh_info id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	nexthop via fe80:2::12 dev veth3 weight 1
	nexthop via fe80:3::13 dev veth5 weight 1

[NEXTHOP]id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
[ROUTE]unicast 2.3.4.10 nhid 2 table main proto boot scope global
	nh_info id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink

JSON:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101,
                    "weight": 11
                },{
                    "id": 102,
                    "weight": 45
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 11,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 45,
                "flags": [ ]
            } ]
  }

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:31:44 -06:00
Nikolay Aleksandrov 7ca868a7aa ip: nexthop: add print_cache_nexthop which prints and manages the nh cache
Add a new helper print_cache_nexthop replacing print_nexthop which can
update the nexthop cache if the process_cache argument is true. It is
used when monitoring netlink messages to keep the nexthop cache up to
date with nexthop changes happening. For the old callers and anyone
who's just dumping nexthops its _nocache version is used which is a
wrapper for print_cache_nexthop.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:59 -06:00
Nikolay Aleksandrov 5d5dc549ce ip: route: print and cache detailed nexthop information when requested
If -d (show_details) is used when printing/monitoring routes then print
detailed nexthop information in the field "nh_info". The nexthop is also
cached for future searches.

Output looks like:
 unicast 198.51.100.0/24 nhid 103 table 3 proto boot scope global
	 nh_info id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
	 nexthop via 169.254.2.22 dev veth2 weight 1
	 nexthop via 169.254.3.23 dev veth4 weight 1

The nh_info field has the same format as ip -d nexthop show would've had
for the same nexthop id.

For completeness the JSON version looks like:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101
                },{
                    "id": 102
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 1,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 1,
                "flags": [ ]
            } ]
 }

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:55 -06:00
Nikolay Aleksandrov cb3d18c29e ip: nexthop: add a helper which retrieves and prints cached nh entry
Add a helper which looks for a nexthop in the cache and if not found
reads the entry from the kernel and caches it. Finally the entry is
printed.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:51 -06:00
Nikolay Aleksandrov 60a9703032 ip: nexthop: add cache helpers
Add a static nexthop cache in a hash with 1024 buckets and helpers to
manage it (link, unlink, find, add nexthop, del nexthop). Adding new
nexthops is done by creating a new rtnl handle and using it to retrieve
the nexthop so the helper is safe to use while already reading a
response (i.e. using the global rth).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:41 -06:00
Nikolay Aleksandrov 53d7c43bd3 ip: nexthop: factor out ipnh_get_id rtnl talk into a helper
Factor out ipnh_get_id's rtnl talk portion into a separate helper which
will be reused later to retrieve nexthops for caching.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:36 -06:00
Nikolay Aleksandrov a2ca431215 ip: nexthop: factor out print_nexthop's nh entry printing
Factor out nexthop entry structure printing from print_nexthop,
effectively splitting it into parse and print parts.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:32 -06:00
Nikolay Aleksandrov 945c26db68 ip: nexthop: parse attributes into nh entry structure before printing
Factor out the nexthop attribute parsing and parse attributes into a
nexthop entry structure which is then used to print.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:28 -06:00
Nikolay Aleksandrov 7ec1cee630 ip: nexthop: add nh entry structure
Add a structure which describes a nexthop, it will be later used to
parse, print and cache nexthops.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:24 -06:00
Nikolay Aleksandrov 60a7515b89 ip: nexthop: split print_nh_res_group into parse and print parts
Now that we have resilient group structure split print_nh_res_group into
a parse and print functions, print_nexthop calls the parse function
first to parse the attributes into the structure and then uses the print
function to print the parsed structure.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:19 -06:00
Nikolay Aleksandrov cfb0a8729e ip: nexthop: add resilient group structure
Add a structure which describes a resilient nexthop group. It will be
later used for parsing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:11 -06:00
Nikolay Aleksandrov 371e889da7 ip: export print_rta_gateway version which outputs prepared gateway string
Export a new __print_rta_gateway that takes a prepared gateway string to
print which is also used by print_rta_gateway for consistent format.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:24:06 -06:00
Nikolay Aleksandrov f72789965e ip: print_rta_if takes ifindex as device argument instead of attribute
We need print_rta_if() to take ifindex directly so later we can use it
with cached converted nexthop objects.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-10-03 18:23:31 -06:00
David Ahern c8c9111a4c Merge branch 'ax.25-netrom-rose' into next
Ralf Baechle  says:

====================

net-tools contain support for these three protocol but are deprecated and
no longer installed by default by many distributions.  Iproute2 otoh has
no support at all and will dump the addresses of these protocols which
actually are pretty human readable as hex numbers:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 88:98:60:a0:92:40:02 brd a2:a6:a8:40:40:40:00
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom 88:98:60:a0:92:40:0a brd 00:00:00:00:00:00:00
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 65:09:33:30:00 brd 00:00:00:00:00

This series adds basic support for the three protocols to print addresses:

 # ip link show dev bpq0
3: bpq0: <UP,LOWER_UP> mtu 256 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ax25 DL0PI-1 brd QST-0
 # ip link show dev nr0
4: nr0: <NOARP,UP,LOWER_UP> mtu 236 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/netrom DL0PI-5 brd *
 # ip link show dev rose0
8: rose0: <NOARP,UP,LOWER_UP> mtu 249 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/rose 6509333000 brd 0000000000

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:03:11 -06:00
Ralf Baechle e2cc9840ea ROSE: Print decoded addresses rather than hex numbers.
NETROM is a OSI layer 3 protocol sitting on top of AX.25.  It uses BCD-
encoded 10 digit telephone numbers as addresses.  Without this ip will
print a ROSE addresses like

  link/rose 12:34:56:78:90 brd 00:00:00:00:00

which is readable but ugly.  With this applied it ROSE addresses will be
printed as

  link/rose 1234567890 brd 0000000000

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:51 -06:00
Ralf Baechle 26c5782fab ROSE: Add rose_ntop implementation.
ROSE addresses are ten digit numbers, basically like North American
telephone numbers.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:45 -06:00
Ralf Baechle fd4c1c8168 NETROM: Print decoded addresses rather than hex numbers.
NETROM is an OSI layer 3 protocol sitting on top of AX.25.  It also uses
AX.25 addresses.  Without this commit ip will print NETROM address like

  link/generic 98:92:9c:aa:b0:40:02 brd 00:00:00:00:00:00:00

while with this commit the decoded result

  link/generic LINUX-1 brd *

is much more eye friendly.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:40 -06:00
Ralf Baechle c63b769ad4 NETROM: Add netrom_ntop implementation.
NETROM uses AX.25 addresses so this is a simple wrapper around ax25_ntop1.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:37 -06:00
Ralf Baechle 399ae00af5 AX.25: Print decoded addresses rather than hex numbers.
Before this, ip would have printed the AX.25 address configured for an
AX.25 interface's default addresses as:

  link/ax25 98:92:9c:aa:b0:40:02 brd a2:a6:a8:40:40:40:00

which is pretty unreadable.  With this commit ip will decode AX.25
addresses like

  link/ax25 LINUX-1 brd QST-0

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:34 -06:00
Ralf Baechle 3a92669b3a AX.25: Add ax25_ntop implementation.
AX.25 addresses are based on Amateur radio callsigns followed by an SSID
like XXXXXX-SS where the callsign is up to 6 characters which are either
letters or digits and the SSID is a decimal number in the range 0..15.
Amateur radio callsigns are assigned by a country's relevant authorities
and are 3..6 characters though a few countries have assigned callsigns
longer than that.  AX.25 is not able to handle such longer callsigns.

Being based on HDLC AX.25 encodes addresses by shifting them one bit left
thus zeroing bit 0, the HDLC extension bit for all but the last bit of
a packet's address field but for our purposes here we're not considering
the HDLC extension bit that is it will always be zero.

Linux' internal representation of AX.25 addresses in Linux is very similar
to this on the on-air or on-the-wire format.  The callsign is padded to
6 octets by adding spaces, followed by the SSID octet then all 7 octets
are left-shifted by one byte.

This for example turns "LINUX-1" where the callsign is LINUX and SSID is 1
into 98:92:9c:aa:b0:40:02.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-23 20:02:30 -06:00
Andrea Claudi 2f5825cb38 lib: bpf_legacy: fix bpffs mount when /sys/fs/bpf exists
bpf selftests using iproute2 fails with:

$ ip link set dev veth0 xdp object ../bpf/xdp_dummy.o section xdp_dummy
Continuing without mounted eBPF fs. Too old kernel?
mkdir (null)/globals failed: No such file or directory
Unable to load program

This happens when the /sys/fs/bpf directory exists. In this case, mkdir
in bpf_mnt_check_target() fails with errno == EEXIST, and the function
returns -1. Thus bpf_get_work_dir() does not call bpf_mnt_fs() and the
bpffs is not mounted.

Fix this in bpf_mnt_check_target(), returning 0 when the mountpoint
exists.

Fixes: d4fcdbbec9 ("lib/bpf: Fix and simplify bpf_mnt_check_target()")
Reported-by: Mingyu Shi <mshi@redhat.com>
Reported-by: Jiri Benc <jbenc@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:30:52 -07:00
Puneet Sharma d756c08a3d tc/f_flower: fix port range parsing
Provided port range in tc rule are parsed incorrectly.
Even though range is passed as min-max. It throws an error.

$ tc filter add dev eth0 ingress handle 100 priority 10000 protocol ipv4 flower ip_proto tcp dst_port 10368-61000 action pass
max value should be greater than min value
Illegal "dst_port"

Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Puneet Sharma <pusharma@akamai.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-22 17:28:48 -07:00
Gokul Sivakumar ebbb701714 lib: bpf_legacy: add prog name, load time, uid and btf id in prog info dump
The BPF program name is included when dumping the BPF program info and the
kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes for the program
name.

$ sudo ip link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited

The BPF program load time (ns since boottime), UID of the user who loaded
the program and the BTF ID are also included when dumping the BPF program
information when the user expects a detailed ip link info output.

$ sudo ip -details link show dev docker0
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:4c:df:a4:54 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filt
ering 0 vlan_protocol 802.1Q bridge_id 8000.2:42:4c:df:a4:54 designated_root 8000.2:42:4c:df:a4:54 root_port 0 r
oot_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer    0.00 tcn_timer    0.00 topology_chan
ge_timer    0.00 gc_timer  265.36 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask
0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast
_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_
interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query
_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_v
ersion 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues
1 gso_max_size 65536 gso_max_segs 65535
    prog/xdp id 789 name xdp_drop_func tag 57cd311f2e27366b jited load_time 2676682607316255 created_by_uid 0 btf_id 708

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-21 09:16:32 -06:00
David Ahern 75c5054e7a Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-14 10:46:48 -06:00
Stephen Hemminger 92e32f7791 uapi: updates from 5.15-rc1
Small changes to virtio etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-13 15:07:58 -07:00
Lahav Schlesinger 0431e1e724 ip: Support filter links/neighs with no master
Commit d3432bf10f17 ("net: Support filtering interfaces on no master")
in the kernel added support for filtering interfaces/neighbours that
have no master interface.

This patch completes it and adds this support to iproute2:
1. ip link show nomaster
2. ip address show nomaster
3. ip neighbour {show | flush} nomaster

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:17:18 -06:00
Lennert Buytenhek 12b3d6a2ad man: ip-macsec: fix gcm-aes-256 formatting issue
The 'ip link add' invocation template at the top of the ip-macsec man
page formats with a pair of extra double quotes:

   ip  link  add  link DEVICE name NAME type macsec [ [ address <lladdr> ]
   port PORT | sci <u64> ]  [  cipher  {  default  |  gcm-aes-128  |  gcm-
   aes-256"}][" icvlen ICVLEN ] [ encrypt { on | off } ] [ send_sci { on |

This is due to missing whitespace around the gcm-aes-256 identifier
in the source file.

Fixes: b16f525323 ("Add support for configuring MACsec gcm-aes-256 cipher type.")
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2021-09-12 11:13:26 -06:00
David Ahern 917d913b2e Merge branch 'main' into next
Conflicts:
	include/uapi/linux/virtio_ids.h

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-08 15:13:49 -06:00
David Ahern d0cba0d1f6 Merge branch 'bridge-mcast_router' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan port/bridge multicast router option. It is
similar to the already existing bridge-wide mcast_router control. Patch 01
moves attribute adding and parsing together for vlan option setting,
similar to global vlan option setting. It simplifies adding new options
because we can avoid reserved values and additional checks. Patch 02
adds the new mcast_router option and updates the related man page.

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Note that this set depends on the latest kernel uapi headers.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:03:58 -06:00
Nikolay Aleksandrov ae895504c6 bridge: vlan: add support for mcast_router option
Add support for setting and dumping per-vlan/interface mcast_router
option. It controls the mcast router mode of a vlan/interface pair.
For bridge devices only modes 0 - 2 are allowed. The possible modes
are:
 0 - disabled
 1 - automatic router presence detection (default)
 2 - permanent router
 3 - temporary router (available only for ports)

Example:
 # mark port ens16 as a permanent mcast router for vlan 100
 $ bridge vlan set dev ens16 vid 100 mcast_router 2
 # disable mcast router for port ens16 and vlan 200
 $ bridge vlan set dev ens16 vid 200 mcast_router 0
 $ bridge -d vlan show
 port              vlan-id
 ens16             1 PVID Egress Untagged
                     state forwarding mcast_router 1
                   100
                     state forwarding mcast_router 2
                   200
                     state forwarding mcast_router 0

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
Nikolay Aleksandrov 12fbe3e4eb bridge: vlan: set vlan option attributes while parsing
Set vlan option attributes immediately while parsing to simplify the
checks, avoid having reserved values (e.g. -1 for unset var) and have
more limited scope for the variables. This is also similar to how global
vlan options are set. The attribute setting and checks are moved with
option parsing, no functional changes intended.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 17:00:31 -06:00
David Ahern db28c944d8 Update kernel headers
Update kernel headers to commit:
    27151f177827 ("Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:59:38 -06:00
Stephen Hemminger 6d676ad934 ip: rewrite routel in python
Not sure if anyone uses the routel script. The script was
a combination of ip route, shell and awk doing command scraping.
It is now possible to do this much better using the JSON
output formats and python.

Rewriting also fixes the bug where the old script could not parse
the current output format.  At the end was getting:
/usr/bin/routel: 48: shift: can't shift that many

The new script also has IPv6 as option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:24 -06:00
Stephen Hemminger 1eaebad2c5 ip: remove routef script
This script is old and limited to IPv4.
Using ip route command directly is better option.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:23 -06:00
Stephen Hemminger adddf30cd8 ip: remove ifcfg script
This script was from olden days of ifcfg.
I don't see any distribution using it and it is time to put
it out to pasture.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:31:19 -06:00
Stephen Hemminger 2c8110881b ip: remove old rtpr script
This script was a one off hack for a special case.
Now that ip commands have better formatting, there is no
real reason for it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-06 16:30:36 -06:00
David Marchand e7e0e2ce65 iptuntap: fix multi-queue flag display
When creating a tap with multi_queue flag, this flag is not displayed
when dumping:

$ ip tuntap add tap23 mode tap multi_queue
$ ip tuntap
tap23: tap persist0x100

While at it, add a space between known flags and hexdump of unknown
ones.

Fixes: c41e038f48 ("iptuntap: allow creation of multi-queue tun/tap device")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:41:17 -07:00
Nikolay Aleksandrov deef844b1e man: ip-link: remove double of
Remove double "of".

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:40:36 -07:00
Luca Boccassi a3272b9372 configure: restore backward compatibility
Commit a9c3d70d90 broke backward compatibility
by making 'configure' error out if parameters are passed, instead of
ignoring them.
Sometimes packaging systems detect 'configure' and assume it's from
autotools, and pass a bunch of options. Eg:

 dh_auto_configure
	./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking

Ignore unknown options again instead of erroring out.

Fixes: a9c3d70d90 ("configure: add options ability")

Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Luca Boccassi ceba59308d tree-wide: fix some typos found by Lintian
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-02 08:39:48 -07:00
Stephen Hemminger 7a70524270 ip: remove leftovers from IPX and DECnet
Iproute2 has not supported DECnet or IPX since version 5.0.
There were some leftover support in the ip options flags
and parsing, remove these.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:03:53 -07:00
Stephen Hemminger 8ab1834e56 uapi: update headers from 5.15 merge
New headers from 5.15 early merge.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-09-01 14:02:50 -07:00
Hangbin Liu 6d0d35bab9 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-09-01 12:51:44 -07:00
David Ahern 926ad64104 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Ilya Dmitrichenko c730bd0b11 ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman df8912ede2 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Peilin Ye 7e7270bb1f tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 86c596ed91 IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman 2d83c71082 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Justin Iurman f0b3808afa Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern acbdef9386 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern 2d6fa30bb8 Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
Gokul Sivakumar 508ad89c82 ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-09-01 12:51:44 -07:00
David Ahern fb843668fb Merge branch 'bridge-vlan-global-mcast' into next
Nikolay Aleksandrov  says:

====================

This set adds support for vlan multicast options. The feature is
globally controlled by a new bridge option called mcast_vlan_snooping
which is added by patch 01. Then patches 2-5 add support for dumping
global vlan options and filtering on vlan id. Patch 06 adds support for
setting global vlan options and then patches 07-18 add all the new
global vlan options, finally patch 19 adds support for dumping vlan
multicast router ports. These options are identical in meaning, names and
functionality as the bridge-wide ones.

All the new vlan global commands are under the global keyword:
 $ bridge vlan global show [ vid VID dev DEVICE ]
 $ bridge vlan global set vid VID dev DEVICE ...

I've added command examples in each commit message. The patch-set is a
bit bigger but the global options follow the same pattern so I don't see
a point in breaking them. All man page descriptions have been taken from
the same current bridge-wide mcast options. The only additional iproute2
change which is left to do is the per-vlan mcast router control which
I'll send separately. Note to properly use this set you'll need the
updated kernel headers where mcast router was moved from a global option
to per-vlan/per-device one (changed uapi enum which was in net-next).

Example:
 # enable vlan mcast snooping globally
 $ ip link set dev bridge type bridge mcast_vlan_snooping 1
 # enable mcast querier on vlan 100
 $ bridge vlan global set dev bridge vid 100 mcast_querier 1
 # show vlan 100's global options
 $ bridge -s vlan global show vid 100
port              vlan-id
bridge            100
                    mcast_snooping 1 mcast_querier 1 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000

A following kernel patch-set will add selftests which use these commands.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:32:31 -06:00
Nikolay Aleksandrov 72222cd467 bridge: vlan: add support for dumping router ports
Add dump support for vlan multicast router ports and their details if
requested. If details are requested we print 1 entry per line, otherwise
we print all router ports on a single line similar to how mdb prints
them.

Looks like:
$ bridge vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20 ens16

Looks like (with -s):
 $ bridge -s vlan global show vid 100
 port              vlan-id
 bridge            100
                     mcast_snooping 1 mcast_querier 0 mcast_igmp_version 2 mcast_mld_version 1 mcast_last_member_count 2 mcast_last_member_interval 100 mcast_startup_query_count 2 mcast_startup_query_interval 3125 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000
                     router ports: ens20   187.57 temp
                                   ens16   118.27 temp

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:31 -06:00
Nikolay Aleksandrov 7ad5505bb5 bridge: vlan: add global mcast_querier option
Add control and dump support for the global mcast_querier option which
controls if the bridge will act as a multicast querier for that vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_querier 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:26 -06:00
Nikolay Aleksandrov 061da2e222 bridge: vlan: add global mcast_startup_query_interval option
Add control and dump support for the global mcast_startup_query_interval
option which controls the interval between queries in the startup phase.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when setting
it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_interval 15000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:29:12 -06:00
Nikolay Aleksandrov 60dcd5c318 bridge: vlan: add global mcast_query_response_interval option
Add control and dump support for the global mcast_query_response_interval
option which sets the Max Response Time/Maximum Response Delay for IGMP/MLD
queries sent by the bridge. To be consistent with the same bridge-wide
option the value is reported with USER_HZ granularity and the same
granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_response_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:47 -06:00
Nikolay Aleksandrov 0e4cfa0370 bridge: vlan: add global mcast_query_interval option
Add control and dump support for the global mcast_query_interval
option which controls the interval between queries sent by the bridge
after the end of the startup phase. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_query_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:29 -06:00
Nikolay Aleksandrov ebcee09ca1 bridge: vlan: add global mcast_querier_interval option
Add control and dump support for the global mcast_querier_interval
option which controls the interval after which if no other router
queries are seen the bridge will start sending its own queries.
To be consistent with the same bridge-wide option the value is reported
with USER_HZ granularity and the same granularity is expected when
setting it.
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_querier_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:28:12 -06:00
Nikolay Aleksandrov 3ae784f589 bridge: vlan: add global mcast_membership_interval option
Add control and dump support for the global mcast_membership_interval
option which controls the interval after which the bridge will leave a
group if no reports have been received for it. To be consistent with the
same bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 26000 (260 seconds).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_membership_interval 13000

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:56 -06:00
Nikolay Aleksandrov 2b6cc38d52 bridge: vlan: add global mcast_last_member_interval option
Add control and dump support for the global mcast_last_member_interval
option which controls the interval between queries to find remaining
members of a group after a leave message. To be consistent with the same
bridge-wide option the value is reported with USER_HZ granularity and
the same granularity is expected when setting it.
The default is 100 (1 second).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_interval 200

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:45 -06:00
Nikolay Aleksandrov 7cc7dbf447 bridge: vlan: add global mcast_startup_query_count option
Add control and dump support for the global mcast_startup_query_count
option which controls the number of queries the bridge will send on the
vlan during startup phase (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_startup_query_count 5

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:27:28 -06:00
Nikolay Aleksandrov 3399c0759f bridge: vlan: add global mcast_last_member_count option
Add control and dump support for the global mcast_last_member_count option
which controls the number of queries the bridge will send on the vlan after
a leave is received (default 2).
Syntax:
 $ bridge vlan global set dev bridge vid 1 mcast_last_member_count 10

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:26:07 -06:00
Nikolay Aleksandrov a8d7212a4f bridge: vlan: add global mcast_mld_version option
Add control and dump support for the global mcast_mld_version option
which controls the MLD version on the vlan (default 1).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_mld_version 2

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:25:17 -06:00
Nikolay Aleksandrov 29fada0f41 bridge: vlan: add global mcast_igmp_version option
Add control and dump support for the global mcast_igmp_version option
which controls the IGMP version on the vlan (default 2).
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_igmp_version 3

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:24:09 -06:00
Nikolay Aleksandrov 1f608d590c bridge: vlan: add global mcast_snooping option
Add control and dump support for the global mcast_snooping option which
controls if multicast snooping is enabled or disabled for a single vlan.
Syntax: $ bridge vlan global set dev bridge vid 1 mcast_snooping 1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:23:26 -06:00
Nikolay Aleksandrov dee5eb05e5 bridge: vlan: add support to set global vlan options
Add support to change global vlan options via a new vlan global
set subcommand similar to the current vlan set subcommand. The man page
and help are updated accordingly. The command works only with bridge
devices. It doesn't support any options yet.

Syntax: $ bridge vlan global set vid VID dev DEV

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:13 -06:00
Nikolay Aleksandrov ecf6d8b4a1 bridge: vlan: add support for vlan filtering when dumping options
In order to allow vlan filtering when dumping options we need to move
all print operations into the option dumping functions and add the
filtering after we've parsed the nested attributes so we can extract the
start and end vlan ids.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:09 -06:00
Nikolay Aleksandrov 720f8613bd bridge: vlan: add support to show global vlan options
Add support for new bridge vlan command grouping called global which
operates on global options. The first command it supports is "show".
To do that we update print_vlan_rtm to recognize the global vlan options
attribute and parse it properly.
Man page and help are also updated with the new command.

Syntax is: $ bridge vlan global show [ vid VID ] [ dev DEV ]

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:04 -06:00
Nikolay Aleksandrov d3a961a9b1 bridge: vlan: skip unknown attributes when printing options
Skip unknown attributes when printing vlan options in print_vlan_rtm.
Make sure print_vlan_opts doesn't accept attributes it doesn't understand.
Currently we print only one type, later global vlan options support will
be added.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:21:00 -06:00
Nikolay Aleksandrov 312e22fe79 bridge: vlan: factor out vlan option printing
Factor out the code which prints current per-vlan options from
print_vlan_rtm without any changes, later we'll filter based on the vlan
attribute and add support for global vlan option printing.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:53 -06:00
Nikolay Aleksandrov d2eecb9d1d ip: bridge: add support for mcast_vlan_snooping
Add support for mcast_vlan_snooping option which controls per-vlan
multicast snooping, also update the man page.
Syntax: $ ip link set dev bridge type bridge mcast_vlan_snooping 0/1

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-31 21:20:03 -06:00
Stephen Hemminger 169f36a0c9 v5.14.0 2021-08-31 11:57:59 -07:00
Jakub Kicinski 85b0e73c77 ss: fix fallback to procfs for raw sockets
Jonas reports that ss -awp does not display any RAW sockets
on a Knoppix 4.4 kernel.

sockdiag_send() diverts to tcpdiag_send() to try the older
netlink interface. tcpdiag_send() works for TCP and DCCP
but not other protocols. Instead of rejecting unsupported
protocols (and missing RAW and SCTP) match on supported ones.

Link: https://lore.kernel.org/netdev/20210815231738.7b42bad4@mmluhan/
Reported-and-tested-by: Jonas Bechtel <post@jbechtel.de>
Fixes: 41fe6c34de ("ss: Add inet raw sockets information gathering via netlink diag interface")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 15:03:46 -07:00
Stephen Hemminger 1afde09498 uapi: update neighbour.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:09:34 -07:00
Gokul Sivakumar 10ecd12690 man: bridge: fix the typo to change "-c[lor]" into "-c[olor]" in man page
Fixes: 3a1ca9a5b ("bridge: update man page for new color and json changes")
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 057d3c6d37 bridge: fdb: don't colorize the "dev" & "dst" keywords in "bridge -c fdb"
To be consistent with the colorized output of "ip" command and to increase
readability, stop highlighting the "dev" & "dst" keywords in the colorized
output of "bridge -c fdb" cmd.

Example: in the following "bridge -c fdb" entry, only "00:00:00:00:00:00",
"vxlan100" and "2001:db8:2::1" fields should be highlighted in color.

00:00:00:00:00:00 dev vxlan100 dst 2001:db8:2::1 self permanent

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Gokul Sivakumar 82149efee9 bridge: reorder cmd line arg parsing to let "-c" detected as "color" option
As per the man/man8/bridge.8 page, the shorthand cmd line arg "-c" can be
used to colorize the bridge cmd output. But while parsing the args in while
loop, matches() detects "-c" as "-compressedvlans" instead of "-color", so
fix this by doing the check for "-color" option first before checking for
"-compressedvlans".

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:04:53 -07:00
Hangbin Liu 3a09567f7d ip/bond: add arp_validate filter support
Add arp_validate filter support based on kernel commit 896149ff1b2c
("bonding: extend arp_validate to be able to receive unvalidated arp-only traffic")

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-18 14:02:44 -07:00
Parav Pandit 355c49ffa5 devlink: Show port state values in man page and in the help command
Port function state can have either of the two values - active or
inactive. Update the documentation and help command for these two
values to tell user about it.

With the introduction of state, hw_addr and state are optional.
Hence mark them as optional in man page that also aligns with the help
command output.

Fixes: bdfb9f1bd6 ("devlink: Support set of port function state")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-11 15:02:30 -07:00
Hangbin Liu ebaa603b30 ip/bond: add lacp active support
lacp_active specifies whether to send LACPDU frames periodically.
If set on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

v2: use strcmp instead of match for new options.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
2021-08-11 12:26:20 -06:00
David Ahern 8d6134b204 Update kernel headers
Update kernel headers to commit:
    88be32634905 ("Merge branch 'dsa-tagger-helpers'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:23:33 -06:00
Ilya Dmitrichenko 51d8fc708c ip/tunnel: always print all known attributes
Presently, if a Geneve or VXLAN interface was created with 'external',
it's not possible for a user to determine e.g. the value of 'dstport'
after creation. This change fixes that by avoiding early returns.

This change partly reverts commit 00ff4b8e31 ("ip/tunnel: Be consistent
when printing tunnel collect metadata").

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:17:52 -06:00
Justin Iurman 71ba9c18e0 ipioam6: use print_nl instead of print_null
This patch addresses Stephen's comment:

"""
> +        print_null(PRINT_ANY, "", "\n", NULL);

Use print_nl() since it handles the case of oneline output.
Plus in JSON the newline is meaningless.
"""

It also removes two useless print_null's.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-11 12:16:09 -06:00
Phil Sutter 9b7ea92b9e tc: u32: Fix key folding in sample option
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.

While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.

Fixes: 267480f553 ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:02:43 -07:00
Andrea Claudi d1eacf12b5 lib: bpf_glue: remove useless assignment
The value of s used inside the cycle is the result of strstr(), so this
assignment is useless.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 20:01:54 -07:00
Andrea Claudi 50a4127022 lib: bpf_legacy: fix potential NULL-pointer dereference
If bpf_map_fetch_name() returns NULL, strlen() hits a NULL-pointer
dereference on outer_map_name.

Fix this checking outer_map_name value, and returning false when NULL,
as already done for inner_map_name before.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:55:12 -07:00
Jacob Keller 954a0077c8 devlink: fix infinite loop on flash update for drivers without status
When processing device flash update, cmd_dev_flash function waits until
the flash process has completed. This requires the following two
conditions to both be true:

a) we've received an exit status from the child process
b) we've received the DEVLINK_CMD_FLASH_UPDATE_END *or*
   we haven't received any status notifications from the driver.

The original devlink flash status monitoring code in 9b13cddfe2
("devlink: implement flash status monitoring") was written assuming that
a driver will either send no status updates, or it will send at least
one DEVLINK_CMD_FLASH_UPDATE_STATUS before DEVLINK_CMD_FLASH_UPDATE_END.

Newer versions of the kernel since commit 52cc5f3a166a ("devlink: move flash
end and begin to core devlink") in v5.10 moved handling of the
DEVLINK_CMD_FLASH_UPDATE_END into the core stack, and will send this
regardless of whether or not the driver sends any of its own status
notifications.

The handling of DEVLINK_CMD_FLASH_UPDATE_END in cmd_dev_flash_status_cb
has an additional condition that it must not be the first message.
Otherwise, it falls back to treating it like
a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is wrong because it can lead to an infinite loop if a driver does
not send any status updates.

In this case, the kernel will send DEVLINK_CMD_FLASH_UPDATE_END without
any DEVLINK_CMD_FLASH_UPDATE_STATUS. The devlink application will see
that ctx->not_first is false, and will treat this like any other status
message. Thus, ctx->not_first will be set to 1.

The loop condition to exit flash update will thus never be true, since
we will wait forever, because ctx->not_first is true, and
ctx->received_end is false.

This leads to the application appearing to process the flash update, but
it will never exit.

Fix this by simply always treating DEVLINK_CMD_FLASH_UPDATE_END the same
regardless of whether its the first message or not.

This is obviously the correct thing to do: once we've received the
DEVLINK_CMD_FLASH_UPDATE_END the flash update must be finished. For new
kernels this is always true, because we send this message in the core
stack after the driver flash update routine finishes.

For older kernels, some drivers may not have sent any
DEVLINK_CMD_FLASH_UPDATE_STATUS or DEVLINK_CMD_FLASH_UPDATE_END. This is
handled by the while loop conditional that exits if we get a return
value from the child process without having received any status
notifications.

An argument could be made that we should exit immediately when we get
either the DEVLINK_CMD_FLASH_UPDATE_END or an exit code from the child
process. However, at a minimum it makes no sense to ever process
DEVLINK_CMD_FLASH_UPDATE_END as if it were a DEVLINK_CMD_FLASH_UPDATE_STATUS.

This is easy to test as it is triggered by the selftests for the
netdevsim driver, which has a test case for both with and without status
notifications.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:54:39 -07:00
Feng Zhou be99929d60 lib/bpf: Fix btf_load error lead to enable debug log
Use tc with no verbose, when bpf_btf_attach fail,
the conditions:
"if (fd < 0 && (errno == ENOSPC || !ctx->log_size))"
will make ctx->log_size != 0. And then, bpf_prog_attach,
ctx->log_size != 0. so enable debug log.
The verifier log sometimes is so chatty on larger programs.
bpf_prog_attach is failed.
"Log buffer too small to dump verifier log 16777215 bytes (9 tries)!"

BTF load failure does not affect prog load. prog still work.
So when BTF/PROG load fail, enlarge log_size and re-fail with
having verbose.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-08-10 19:53:54 -07:00
Peilin Ye e78411948d tc/skbmod: Introduce SKBMOD_F_ECN option
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting.  For example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		u32 match ip protocol 1 0xff flowid 1:2 \
		action skbmod \
		ecn

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.

Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".

[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-08 11:56:55 -06:00
David Ahern 09d8ce3db1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-04 09:24:12 -06:00
David Ahern e8763fc9ab Merge branch 'ipv6-oam' into next
Justin Iurman says:

====================

The IOAM patchset was merged recently (see net-next commits [1,2,3,4,5,6]).
Therefore, this patchset provides support for IOAM inside iproute2, as well as
manpage documentation. Here is a summary of added features inside iproute2.

(1) configure IOAM namespaces and schemas:

$ ip ioam
Usage:  ip ioam { COMMAND | help }
        ip ioam namespace show
        ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
        ip ioam namespace del ID
        ip ioam schema show
        ip ioam schema add ID DATA
        ip ioam schema del ID
        ip ioam namespace set ID schema { ID | none }

(2) provide a new encap type to insert the IOAM pre-allocated trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

  [1] db67f219fc9365a0c456666ed7c134d43ab0be8a
  [2] 9ee11f0fff205b4b3df9750bff5e94f97c71b6a0
  [3] 8c6f6fa6772696be0c047a711858084b38763728
  [4] 3edede08ff37c6a9370510508d5eeb54890baf47
  [5] de8e80a54c96d2b75377e0e5319a64d32c88c690
  [6] 968691c777af78d2daa2ee87cfaeeae825255a58

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:34:09 -06:00
Justin Iurman 78832863ef IOAM man8
This patch provides man8 documentation for IOAM inside ip, ip-ioam and ip-route.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:35 -06:00
Justin Iurman 32f4969d44 New IOAM6 encap type for routes
This patch provides a new encap type for routes to insert an IOAM pre-allocated
trace:

$ ip -6 ro ad fc00::1/128 encap ioam6 trace prealloc type 0x800000 ns 1 size 12 dev eth0

where:
 - "trace" and "prealloc" may appear as useless but just anticipate for future
   implementations of other ioam option types.
 - "type" is a bitfield (=u32) defining the IOAM pre-allocated trace type (see
   the corresponding uapi).
 - "ns" is an IOAM namespace ID attached to the pre-allocated trace.
 - "size" is the trace pre-allocated size in bytes; must be a 4-octet multiple;
   limited size (see IOAM6_TRACE_DATA_SIZE_MAX).

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:31 -06:00
Justin Iurman 2909812583 Add, show, link, remove IOAM namespaces and schemas
This patch provides support for adding, listing and removing IOAM namespaces
and schemas with iproute2. When adding an IOAM namespace, both "data" (=u32)
and "wide" (=u64) are optional. Therefore, you can either have none, one of
them, or both at the same time. When adding an IOAM schema, there is no
restriction on "DATA" except its size (see IOAM6_MAX_SCHEMA_DATA_LEN). By
default, an IOAM namespace has no active IOAM schema (meaning an IOAM namespace
is not linked to an IOAM schema), and an IOAM schema is not considered
as "active" (meaning an IOAM schema is not linked to an IOAM namespace). It is
possible to link an IOAM namespace with an IOAM schema, thanks to the last
command below (meaning the IOAM schema will be considered as "active" for the
specific IOAM namespace).

$ ip ioam
Usage:	ip ioam { COMMAND | help }
	ip ioam namespace show
	ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ]
	ip ioam namespace del ID
	ip ioam schema show
	ip ioam schema add ID DATA
	ip ioam schema del ID
	ip ioam namespace set ID schema { ID | none }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:33:05 -06:00
David Ahern e53f4cd504 Import ioam6 uapi headers
Import ioam6 uapi headers from kernel headers at last sync commit.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 11:32:26 -06:00
David Ahern 236696e52c Update kernel headers
Update kernel headers to commit:
    1187c8c4642d ("net: phy: mscc: make some arrays static const, makes object smaller")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:25:09 -06:00
Gokul Sivakumar cf866f0a5a ipneigh: add support to print brief output of neigh cache in tabular format
Make use of the already available brief flag and print the basic details of
the IPv4 or IPv6 neighbour cache in a tabular format for better readability
when the brief output is expected.

$ ip -br neigh
172.16.12.100                           bridge0          b0:fc:36:2f:07:43
172.16.12.174                           bridge0          8c:16:45:2f:bc:1c
172.16.12.250                           bridge0          04:d9:f5:c1:0c:74
fe80::267b:9f70:745e:d54d               bridge0          b0:fc:36:2f:07:43
fd16:a115:6a62:0:8744:efa1:9933:2c4c    bridge0          8c:16:45:2f:bc:1c
fe80::6d9:f5ff:fec1:c74                 bridge0          04:d9:f5:c1:0c:74

And add "ip neigh show" to the list of ip sub commands mentioned in the man
page that support the brief output in tabular format.

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-08-02 10:14:50 -06:00
Peilin Ye c06d313d86 tc/skbmod: Remove misinformation about the swap action
Currently man 8 tc-skbmod says that "...the swap action will occur after
any smac/dmac substitutions are executed, if they are present."

This is false.  In fact, trying to "set" and "swap" in a single skbmod
command causes the "set" part to be completely ignored.  As an example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		matchall action skbmod \
        	set dmac AA:AA:AA:AA:AA:AA smac BB:BB:BB:BB:BB:BB \
        	swap mac

The above command simply does a "swap", without setting DMAC or SMAC to
AA's or BB's.  The root cause of this is in the kernel, see
net/sched/act_skbmod.c:tcf_skbmod_init():

	parm = nla_data(tb[TCA_SKBMOD_PARMS]);
	index = parm->index;
	if (parm->flags & SKBMOD_F_SWAPMAC)
		lflags = SKBMOD_F_SWAPMAC;
		^^^^^^^^^^^^^^^^^^^^^^^^^^

Doing a "=" instead of "|=" clears all other "set" flags when doing a
"swap".  Discourage using "set" and "swap" in the same command by
documenting it as undefined behavior, and update the "SYNOPSIS" section
as well as tc -help text accordingly.

If one really needs to e.g. "set" DMAC to all AA's then "swap" DMAC and
SMAC, one should do two separate commands and "pipe" them together.

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-22 15:14:29 -07:00
Roi Dayan 71d36000dc police: Fix normal output back to what it was
With the json support fix the normal output was
changed. set it back to what it was.
Print overhead with print_size().
Print newline before ref.

Fixes: 0d5cf51e0d ("police: Add support for json output")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:14:30 -07:00
Lahav Schlesinger f760bff328 ipmonitor: Fix recvmsg with ancillary data
A successful call to recvmsg() causes msg.msg_controllen to contain the length
of the received ancillary data. However, the current code in the 'ip' utility
doesn't reset this value after each recvmsg().

This means that if a call to recvmsg() doesn't have ancillary data, then
'msg.msg_controllen' will be set to 0, causing future recvmsg() which do
contain ancillary data to get MSG_CTRUNC set in msg.msg_flags.

This fixes 'ip monitor' running with the all-nsid option - With this option the
kernel passes the nsid as ancillary data. If while 'ip monitor' is running an
even on the current netns is received, then no ancillary data will be sent,
causing 'msg.msg_controllen' to be set to 0, which causes 'ip monitor' to
indefinitely print "[nsid current]" instead of the real nsid.

Fixes: 449b824ad1 ("ipmonitor: allows to monitor in several netns")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:13:36 -07:00
Stephen Hemminger 7a7e9ed98f uapi: headers update
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-17 11:12:47 -07:00
Christian Schürmann 1f2c908d53 man8/ip-tunnel.8: fix typo, 'encaplim' is not a valid option
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-15 09:31:51 -07:00
Alexander Mikhalitsyn 115e987035 libnetlink: check error handler is present before a call
Fix nullptr dereference of errhndlr from rtnl_dump_filter_arg
struct in rtnl_dump_done and rtnl_dump_error functions.

Fixes: 459ce6e3d7 ("ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Roi Dayan <roid@nvidia.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-11 10:33:44 -07:00
Stephen Hemminger 0015ada629 libnetlink: cosmetic changes
Don't initialize arguments that are NULL, and format initialization
in a more logical way.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:39:07 -07:00
Alexander Mikhalitsyn 459ce6e3d7 ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped
We started to use in-kernel filtering feature which allows to get only
needed tables (see iproute_dump_filter()). From the kernel side it's
implemented in net/ipv4/fib_frontend.c (inet_dump_fib), net/ipv6/ip6_fib.c
(inet6_dump_fib). The problem here is that behaviour of "ip route save"
was changed after
c7e6371bc ("ip route: Add protocol, table id and device to dump request").
If filters are used, then kernel returns ENOENT error if requested table
is absent, but in newly created net namespace even RT_TABLE_MAIN table
doesn't exist. It is really allocated, for instance, after issuing
"ip l set lo up".

Reproducer is fairly simple:
$ unshare -n ip route save > dump
Error: ipv4: FIB table does not exist.
Dump terminated

Expected result here is to get empty dump file (as it was before this
change).

v2: reworked, so, now it takes into account NLMSGERR_ATTR_MSG
(see nl_dump_ext_ack_done() function). We want to suppress error messages
in stderr about absent FIB table from kernel too.

v3: reworked to make code clearer. Introduced rtnl_suppressed_errors(),
rtnl_suppress_error() helpers. User may suppress up to 3 errors (may be
easily extended by changing SUPPRESS_ERRORS_INIT macro).

v4: reworked, rtnl_dump_filter_errhndlr() was introduced. Thanks
to Stephen Hemminger for comments and suggestions

v5: space fixes, commit message reformat, empty initializers

Fixes: c7e6371bc ("ip route: Add protocol, table id and device to dump request")
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-07 07:32:56 -07:00
Stephen Hemminger 8f85d085fe uapi: update kernel headers from 5.14-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 17:07:24 -07:00
Martynas Pumputis 83d4d61bc9 libbpf: fix attach of prog with multiple sections
When BPF programs which consists of multiple executable sections via
iproute2+libbpf (configured with LIBBPF_FORCE=on), we noticed that a
wrong section can be attached to a device. E.g.:

    # tc qdisc replace dev lxc_health clsact
    # tc filter replace dev lxc_health ingress prio 1 \
        handle 1 bpf da obj bpf_lxc.o sec from-container
    # tc filter show dev lxc_health ingress filter protocol all
        pref 1 bpf chain 0 filter protocol all pref 1 bpf chain 0
        handle 0x1 bpf_lxc.o:[__send_drop_notify] <-- WRONG SECTION
        direct-action not_in_hw id 38 tag 7d891814eda6809e jited

After taking a closer look into load_bpf_object() in lib/bpf_libbpf.c,
we noticed that the filter used in the program iterator does not check
whether a program section name matches a requested section name
(cfg->section). This can lead to a wrong prog FD being used to attach
the program.

Fixes: 6d61a2b557 ("lib: add libbpf support")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-07-06 16:59:39 -07:00
David Ahern 02c06ffc13 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-07-01 14:29:42 +00:00
Stephen Hemminger fc3511962d lib: remove blank line at eof
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 13:20:44 -07:00
Stephen Hemminger 0e7ea3e8fe v5.13.0 2021-06-29 11:24:17 -07:00
Ben Hutchings 33cf9306c8 devlink: Fix printf() type mismatches on 32-bit architectures
devlink currently uses "%lu" to format values of type uint64_t,
but on 32-bit architectures uint64_t is defined as unsigned
long long and this does not work correctly.

Fix this by using the standard macro PRIu64 instead.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Ben Hutchings 4ac0383a59 utils: Fix BIT() to support up to 64 bits on all architectures
devlink and vdpa use BIT() together with 64-bit flag fields.  devlink
is already using bit numbers greater than 31 and so does not work
correctly on 32-bit architectures.

Fix this by making BIT() use uint64_t instead of unsigned long.

Signed-off-by: Ben Hutchings <ben.hutchings@mind.be>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-29 11:10:14 -07:00
Stephen Hemminger c73fb66070 uapi: update headers to 5.13
Final 5.13 header update

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-28 10:19:08 -07:00
Roi Dayan 6f15c21719 devlink: Fix link errors on some systems
On some systems we fail to link because of missing math lib.
add -lm to devlink.

    LINK     devlink
../lib/libutil.a(utils_math.o): In function `get_rate':
utils_math.c:(.text+0xcc): undefined reference to `floor'
../lib/libutil.a(utils_math.o): In function `get_size':
utils_math.c:(.text+0x384): undefined reference to `floor'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:16: devlink] Error 1
make: *** [Makefile:64: all] Error 2

Fixes: 6c70aca76e ("devlink: Add port func rate support")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-26 14:57:27 -07:00
Asbjørn Sloth Tønnesen 2ff4761db4 tc: pedit: add decrement operation
Implement a decrement operation for ttl and hoplimit.

Since this is just syntactic sugar, it goes that:

  tc filter add ... action pedit ex munge ip ttl dec ...
  tc filter add ... action pedit ex munge ip6 hoplimit dec ...

is just a more readable version of this:

  tc filter add ... action pedit ex munge ip ttl add 0xff ...
  tc filter add ... action pedit ex munge ip6 hoplimit add 0xff ...

This feature was suggested by some pseudo tc examples in Mellanox's
documentation[1], but wasn't present in neither their mlnx-iproute2
nor iproute2.

Tested with skip_sw on Mellanox ConnectX-6 Dx.

[1] https://docs.mellanox.com/pages/viewpage.action?pageId=47033989

v3:
   - Use dedicated flags argument in parse_cmd() (David Ahern)
   - Minor rewording of the man page

v2:
   - Fix whitespace issue (Stephen Hemminger)
   - Add to usage info in explain()

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:45:19 +00:00
Asbjørn Sloth Tønnesen bc5e8473aa tc: pedit: parse_cmd: add flags argument
This patch just prepares the flags argument, so it's
available to the next patch.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:44:35 +00:00
Sergey Ryazanov 6acccd52a2 iplink: support for WWAN devices
The WWAN subsystem has been extended to generalize the per data channel
network interfaces management. This change implements support for WWAN
links handling. And actively uses the earlier introduced ip-link
capability to specify the parent by its device name.

The WWAN interface for a new data channel should be created with a
command like this:

ip link add dev wwan0-2 parentdev wwan0 type wwan linkid 2

Where: wwan0 is the modem HW device name (should be taken from
/sys/class/wwan) and linkid is an identifier of the opened data
channel.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:57 +00:00
Sergey Ryazanov 362da458a4 iplink: add support for parent device
Add support for specifying a parent device (struct device) by its name
during the link creation and printing parent name in the links list.
This option will be used to create WWAN links and possibly by other
device classes that do not have a "natural parent netdev".

Add the parent device bus name printing for links list info
completeness. But do not add a corresponding command line argument, as
we do not have a use case for this attribute.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:40:22 +00:00
David Ahern 083e2706e1 Import wwan.h uapi file
Import wwan.h uapi file at version from last kernel headers sync.

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-26 04:39:47 +00:00
Stephen Hemminger 8316825a52 man: fix syntax for ip link property
The ip link property add/delete requires a device; but the
device argument was not show on the man page.
It is correct in the usage message.

Fixes: 3aa0e51be6 ("ip: add support for alternative name addition/deletion/list")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-24 11:54:04 -07:00
Paolo Lungaroni 3e26254f31 seg6: add support for SRv6 End.DT46 Behavior
We introduce the new "End.DT46" action for supporting the SRv6 End.DT46
Behavior in iproute2.
The SRv6 End.DT46 Behavior, defined in RFC 8986 [1] section 4.8, can be
used to implement L3 VPNs based on Segment Routing over IPv6 networks in
multi-tenants environments and it is capable of handling both IPv4 and
IPv6 tenant traffic at the same time.
The SRv6 End.DT46 Behavior decapsulates the received packets and it
performs the IPv4 or IPv6 routing lookup in the routing table of the
tenant.

As for the End.DT4 and for the End.DT6 in VRF mode, the SRv6 End.DT46
Behavior leverages a VRF device in order to force the routing lookup into
the associated routing table using the "vrftable" attribute.

To make the End.DT46 work properly, it must be guaranteed that the
routing table used for routing lookup operations is bound to one and
only one VRF during the tunnel creation. Such constraint has to be
enforced by enabling the VRF strict_mode sysctl parameter, i.e.:

 $ sysctl -wq net.vrf.strict_mode=1

Note that the same approach is used for the End.DT4 Behavior and for the
End.DT6 Behavior in VRF mode.

An SRv6 End.DT46 Behavior instance can be created as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT46 vrftable 100 dev vrf100 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::1
[ {
        "dst": "2001:db8::1",
        "encap": "seg6local",
        "action": "End.DT46",
        "vrftable": 100,
        "dev": "vrf100",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

This patch updates the route.8 man page and the ip route help with the
information related to End.DT46.
Considering that the same information was missing for the SRv6 End.DT4 and
the End.DT6 Behaviors, we have also added it.

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:36:17 +00:00
David Ahern 1d11326a57 Update kernel headers
Update kernel headers to commit:
    ef2c3ddaa4ed ("ibmvnic: Use strscpy() instead of strncpy()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-22 15:33:45 +00:00
Guillaume Nault f8879e85f0 utils: bump max args number to 512 for batch files
Large tc filters can have many arguments. For example the following
filter matches the first 7 MPLS LSEs, pops all of them, then updates
the Ethernet header and redirects the resulting packet to eth1.

filter add dev eth0 ingress handle 44 priority 100 \
  protocol mpls_uc flower mpls                     \
    lse depth 1 label 1040076 tc 4 bos 0 ttl 175   \
    lse depth 2 label 89648 tc 2 bos 0 ttl 9       \
    lse depth 3 label 63417 tc 5 bos 0 ttl 185     \
    lse depth 4 label 593135 tc 5 bos 0 ttl 67     \
    lse depth 5 label 857021 tc 0 bos 0 ttl 181    \
    lse depth 6 label 239239 tc 1 bos 0 ttl 254    \
    lse depth 7 label 30 tc 7 bos 1 ttl 237        \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol mpls_uc pipe            \
  action mpls pop protocol ipv6 pipe               \
  action vlan pop_eth pipe                         \
  action vlan push_eth                             \
    dst_mac 00:00:5e:00:53:7e                      \
    src_mac 00:00:5e:00:53:03 pipe                 \
  action mirred egress redirect dev eth1

This filter has 149 arguments, so it can't be used with tc -batch
which is limited to a 100.

Let's bump the limit to 512. That should leave a lot of room for big
batch commands.

v2:
   -Define the limit in utils.h (Stephen Hemminger)
   -Bump the limit even higher (256 -> 512) (Stephen Hemminger)

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-18 02:57:05 +00:00
Stephen Hemminger e1d3ac755d uapi: update kernel headers to 5.13-rc6
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-17 15:54:05 -07:00
David Ahern d8b3b9d32d Merge branch 'devlink-rate-support' into next
Dmytro Linkin says:

====================

Series implements devlink rate commands, which are:
- Dump particular or all rate objects (JSON or non-JSON)
- Add/Delete node rate object
- Set tx rate share/max values for rate object
- Set/Unset parent rate object for other rate object

Examples:

Display all rate objects:

    # devlink port function rate show
    pci/0000:03:00.0/1 type leaf parent some_group
    pci/0000:03:00.0/2 type leaf tx_share 12Mbit
    pci/0000:03:00.0/some_group type node tx_share 1Gbps tx_max 5Gbps

Display leaf rate object bound to the 1st devlink port of the
pci/0000:03:00.0 device:

    # devlink port function rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/1 type leaf

Display node rate object with name some_group of the pci/0000:03:00.0
device:

    # devlink port function rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group type node

Display leaf rate object rate values using IEC units:

    # devlink -i port function rate show pci/0000:03:00.0/2
    pci/0000:03:00.0/2 type leaf 11718Kibit

Display pci/0000:03:00.0/2 leaf rate object as pretty JSON output:

    # devlink -jp port function rate show pci/0000:03:00.0/2
    {
        "rate": {
            "pci/0000:03:00.0/2": {
                "type": "leaf",
                "tx_share": 1500000
            }
        }
    }

Create node rate object with name "1st_group" on pci/0000:03:00.0 device:

    # devlink port function rate add pci/0000:03:00.0/1st_group

Create node rate object with specified parameters:

    # devlink port function rate add pci/0000:03:00.0/2nd_group \
        tx_share 10Mbit tx_max 30Mbit parent 1st_group

Set parameters to the specified leaf rate object:

    # devlink port function rate set pci/0000:03:00.0/1 \
        tx_share 2Mbit tx_max 10Mbit

Set leaf's parent to "1st_group":

    # devlink port function rate set pci/0000:03:00.0/1 parent 1st_group

Unset leaf's parent:

    # devlink port function rate set pci/0000:03:00.0/1 noparent

Delete node rate object:

    # devlink port function rate del pci/0000:03:00.0/2nd_group

Rate values can be specified in bits or bytes per second (bit|bps), with
any SI (k, m, g, t) or IEC (ki, mi, gi, ti) prefix. Bare number means
bits per second. Units also printed in "show" command output, but not
necessarily the same which were specified with "set" or "add" command.
-i/--iec switch force output in IEC units. JSON output always print
values as bytes per sec.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:34 +00:00
Dmytro Linkin dedf895184 devlink: Add ISO/IEC switch
Add -i/--iec switch to print rate values using binary prefixes.
Update devlink(8) and devlink-rate(8) pages.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:13 +00:00
Dmytro Linkin 6c70aca76e devlink: Add port func rate support
Implement user commands to manage devlink port func rate objects.
List all rate commands:

    $ devlink port func rate help

or just

    $ devlink port func rate

To list all OR particular rate object:

    $ devlink port func rate show
    pci/0000:03:00.0/some_group: type node
    pci/0000:03:00.0/0: type leaf
    pci/0000:03:00.0/1: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/1
    pci/0000:03:00.0/0: type leaf

    $ devlink prot func rate show pci/0000:03:00.0/some_group
    pci/0000:03:00.0/some_group: type node

Rate object of type "leaf" created by it's driver where name is the name
of corresponding devlink port. Rate object of type "node" represents
rate group created by the user using commands:

    $ devlink port func rate add pci/0000:03:00.0/some_group

or with defining tx rate limits

    $ devlink port func rate add pci/0000:03:00.0/some_group \
        tx_shara 10kbit tx_max 100mbit

NOTE: node name cannot be a decimal value because it conflicts with
devlink port indexes.

To delete node object:

    $ devlink port func rate del pci/0000:03:00.0/some_group

Set rate limits of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 \
        tx_share 5MBps tx_max 25GBps

    $ devlink prot func rate set pci/0000:03:00.0/some_group \
        tx_share 0

Both SET and ADD commands accept any units of rates defined in IEC
60027-2 standard.

NOTE: rate value 0 means that rate is unlimited. Such value is also
ommited in show command output.

NOTE: In SHOW command output rate values will be printed with suffixes
as well, but in JSON output they are always units of Bps.

Set or unset parent of existing rate object:

    $ devlink prot func rate set pci/0000:03:00.0/0 parent some_group

    $ devlink port func rate set pci/0000:03:00.0/0 noparent

NOTE: Setting parent to empty ("") name due to kernel logic means unset
parent and shouldn't be used to avoid unexpected parent unsets.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:38:06 +00:00
Dmytro Linkin 95339955c5 devlink: Add helper function to validate object handler
Every handler argument validated in two steps, first of which, form
checking, expects identifier is few words separated by slashes.
For device and region handlers just checked if identifier have expected
number of slashes.
Add generic function to do that and make code cleaner & consistent.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-12 04:37:21 +00:00
David Ahern 85903c9a29 Update kernel headers
Update kernel headers to commit:
    76cf404c40ae ("Merge branch 'ipa-mem-2'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:38:23 +00:00
Parav Pandit fbd4b581cb devlink: Add optional controller user input
A user optionally provides the external controller number when user
wants to create devlink port for the external controller.

An example on eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev

$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:49 +00:00
Roi Dayan 0d5cf51e0d police: Add support for json output
Change to use the print wrappers instead of fprintf().

This is example output of the options part before this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1 police 0x2 ,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    }overhead 0b linklayer unspec
        ref 1 bind 1
,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

This is the output of the same dump with this commit:

        "options": {
            "handle": 1,
            "in_hw": true,
            "actions": [ {
                    "order": 1,
                    "kind": "police",
                    "index": 2,
                    "control_action": {
                        "type": "drop"
                    },
                    "control_action": {
                        "type": "continue"
                    },
                    "overhead": 0,
                    "linklayer": "unspec",
                    "ref": 1,
                    "bind": 1,
                    "used_hw_stats": [ "delayed" ]
                } ]
        }

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-11 02:28:36 +00:00
Eric Dumazet 52f136f640 tc: fq: add horizon attributes
Commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute")
added kernel support for horizon attributes in linux-5.8

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
 Sent 690924 bytes 3234 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 112 (inactive 104 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

$ tc qd change dev wlp2s0 root fq horizon 500ms horizon_cap

$ tc -s -d qd sh dev wlp2s0
qdisc fq 8006: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 500ms horizon_cap
 Sent 831220 bytes 3844 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  flows 122 (inactive 120 throttled 0)
  gc 0 highprio 0 throttled 2 latency 8.25us

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-07 02:56:01 +00:00
Hangbin Liu 7ae2585b86 configure: convert LIBBPF environment variables to command-line options
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:59 +00:00
Hangbin Liu a9c3d70d90 configure: add options ability
There are more and more global environment variables that land everywhere
in configure, which is making user hard to know which one does what.
Using command-line options would make it easier for users to learn or
remember the config options.

This patch converts the INCLUDE variable to command option first. Check
if the first variable has '-' to compile with the old INCLUDE path
setting method.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-06-03 03:25:11 +00:00
Roman Mashak 9d9b1a84a5 ss: update ss man page
'-b' option allows to request BPF filter opcodes, however
currently the kernel returns only classic BPF filter, so
reflect this in man page.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-06-01 15:55:06 -07:00
Ariel Levkovich 825bd5dacb tc: f_flower: Add missing ct_state flags to usage description
Add ct_state flags rpl and inv to the commands usage
description

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:40:05 +00:00
Ariel Levkovich 7fda6c588a tc: f_flower: Add option to match on related ct state
Add support for matching on ct_state flag related.
The related state indicates a packet is associated with an existing
connection.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state -est-rel+trk \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +rel+trk \
  action mirred egress redirect dev ens1f0_1

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-27 14:39:14 +00:00
Florian Westphal d3740fdc26 libgenl: make genl_add_mcast_grp set errno on error
genl_add_mcast_grp doesn't set errno in all cases.

On kernels that support mptcp but lack event support (all kernels <= 5.11)
MPTCP_PM_EV_GRP_NAME won't be found and ip will exit with

    "can't subscribe to mptcp events: Success"

Set errno to a meaningful value (ENOENT) when the group name isn't found
and also cover other spots where it returns nonzero with errno unset.

Fixes: ff619e4fd3 ("mptcp: add support for event monitoring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-05-17 11:59:37 -07:00
Heiko Thiery c5b72cc56b lib/fs: fix issue when {name,open}_to_handle_at() is not implemented
With commit d5e6ee0dac the usage of functions name_to_handle_at() and
open_by_handle_at() are introduced. But these function are not available
e.g. in uclibc-ng < 1.0.35. To have a backward compatibility check for the
availability in the configure script and in case of absence do a direct
syscall.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-17 02:31:29 +00:00
David Ahern 62c88ed940 config.mk: Rerun configure when it is newer than config.mk
config.mk needs to be re-generated any time configure is changed.
Rename the existing make target and add a check that the config.mk
file needs to exist and must be newer than configure script.

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Tested-by: Petr Vorel <petr.vorel@gmail.com>
2021-05-17 02:13:56 +00:00
Jakub Kicinski 49437375b6 ip: dynamically size columns when printing stats
This change makes ip -s -s output size the columns
automatically. I often find myself using json
output because the normal output is unreadable.
Even on a laptop after 2 days of uptime byte
and packet counters almost overflow their columns,
let alone a busy server.

For max readability switch to right align.

Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8227918473 8617683  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    691937917  4727223  0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

After:

    RX:  bytes packets errors dropped  missed   mcast
    8228633710 8618408      0       0       0       0
    RX errors:  length    crc   frame    fifo overrun
                     0      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
     692006303 4727740      0       0       0       0
    TX errors: aborted   fifo  window heartbt transns
                     0      0       0       0      10

More importantly, with large values before:

    RX: bytes  packets  errors  dropped overrun mcast
    126570234447969 15016149200 0       0       0       0
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    126570234447969 15016149200 0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       10

Note that in this case we have full shift by a column,
e.g. the value under "dropped" is actually for "errors" etc.

After:

    RX:       bytes     packets errors dropped  missed   mcast
    126570234447969 15016149200      0       0       0       0
    RX errors:           length    crc   frame    fifo overrun
                              0      0       0       0       0
    TX:       bytes     packets errors dropped carrier collsns
    126570234447969 15016149200      0       0       0       0
    TX errors:          aborted   fifo  window heartbt transns
                              0      0       0       0      10

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:51:59 +00:00
Paolo Lungaroni 02ca3aabe9 seg6: add counters support for SRv6 Behaviors
We introduce the "count" optional attribute for supporting counters in SRv6
Behaviors as defined in [1], section 6. For each SRv6 Behavior instance,
counters defined in [1] are:

 - the total number of packets that have been correctly processed;
 - the total amount of traffic in bytes of all packets that have been
   correctly processed;

In addition, we introduce a new counter that counts the number of packets
that have NOT been properly processed (i.e. errors) by an SRv6 Behavior
instance.

Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters specifing the "count" attribute as follows:

 $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0

per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:

 $ ip -s -6 route show 2001:db8::1
 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters

v2:
 - add help and route.8 man page updates

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:20:59 +00:00
Andrea Claudi e44786b269 tc: htb: improve burst error messages
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"

The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.

This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"

$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"

Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:13:22 +00:00
Andrea Claudi 28ee49e515 tipc: bail out if key is abnormally long
tipc segfaults when called with an abnormally long key:

$ tipc node set key 0123456789abcdef0123456789abcdef0123456789abcdef
*** buffer overflow detected ***: terminated

Fix this returning an error if key length is longer than
TIPC_AEAD_KEYLEN_MAX.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Andrea Claudi 93c267bfb4 tipc: bail out if algname is abnormally long
tipc segfaults when called with an abnormally long algname:

$ tipc node set key 0x1234 algname supercalifragilistichespiralidososupercalifragilistichespiralidoso
*** buffer overflow detected ***: terminated

Fix this returning an error if provided algname is longer than
TIPC_AEAD_ALG_NAME.

Fixes: 24bee3bf97 ("tipc: add new commands to set TIPC AEAD key")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Hoang Le 459f280813 tipc: call a sub-routine in separate socket
When receiving a result from first query to netlink, we may exec
a another query inside the callback. If calling this sub-routine
in the same socket, it will be discarded the result from previous
exection.
To avoid this we perform a nested query in separate socket.

Fixes: 2021028306 ("tipc: use the libmnl functions in lib/mnl_utils.c")
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-09 22:08:47 +00:00
Tyson Moore 0d95472a4b tc-cake: update docs to include LE diffserv
Linux kernel commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB
support to CAKE diffserv handling") added packets with LE diffserv to
the Bulk priority tin. Update the documentation to reflect this change.

Signed-off-by: Tyson Moore <tyson@tyson.me>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:59:52 +00:00
Andrea Claudi 2d212aae55 dcb: fix memory leak
main() dinamically allocates dcb, but when dcb_help() is called it
returns without freeing it.

Fix this using a goto, as it is already done in the same function.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:48:02 +00:00
Andrea Claudi cfd89a6f8b dcb: fix return value on dcb_cmd_app_show
dcb_cmd_app_show() is supposed to return EINVAL if an incorrect argument
is provided.

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:47:57 +00:00
Andrea Claudi 3296d4fe77 lib: bpf_legacy: avoid to pass invalid argument to close()
In function bpf_obj_open, if bpf_fetch_prog_arg() return an error, we
end up in the out: path with a negative value for fd, and pass it to
close.

Avoid this checking for fd to be positive.

Fixes: 32e93fb7f6 ("{f,m}_bpf: allow for sharing maps")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:43:54 +00:00
Andrea Claudi a2f1f66075 tc: q_ets: drop dead code from argument parsing
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
  parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
  parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
  specified and nbands is 0, terminating execution.

Thus nbands cannot be < 1 at this point and this code cannot be executed.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:42:44 +00:00
Jakub Kicinski 570d2cf0ec ip: align the name of the 'nohandler' stat
Before:

    RX: bytes  packets  errors  dropped missed  mcast
    8848233056 8548168  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun   nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1142925945 4683483  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

After:

    RX: bytes  packets  errors  dropped missed  mcast
    8848297833 8548461  0       0       0       0
    RX errors: length   crc     frame   fifo    overrun nohandler
               0        0       0       0       0       101
    TX: bytes  packets  errors  dropped carrier collsns compressed
    1143049820 4683865  0       0       0       0       101
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       14

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:41:19 +00:00
David Ahern c3f852754f Update kernel headers
Update kernel headers to commit:
    8621436671f3 ("smc: disallow TCP_ULP in smc_setsockopt()")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-06 14:16:04 +00:00
David Ahern c79fcefaaf Merge branch 'rdma-copy-on-fork' into next
Gal Pressman  says:

====================

This is the userspace part for the new copy-on-fork attribute added to
the get sys netlink command.

The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

Kernel series was merged:
https://lore.kernel.org/linux-rdma/20210418121025.66849-1-galpress@amazon.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:45:19 +00:00
Gal Pressman bce4247869 rdma: Add copy-on-fork to get sys command
The new attribute indicates that the kernel copies DMA pages on fork,
hence fork support through madvise and MADV_DONTFORK is not needed.

If the attribute is not reported (expected on older kernels),
copy-on-fork is disabled.

Example:
$ rdma sys
netns shared copy-on-fork on

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:13 +00:00
Gal Pressman 212e2c1d0c rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit
6cc9e215eb27 ("RDMA/nldev: Add copy-on-fork attribute to get sys command")

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-05-03 14:43:06 +00:00
Jianguo Wu 7f1d58d1a1 mptcp: make sure flag signal is set when add addr with port
When add address with port, it is mean to send an ADD_ADDR to remote,
so it must have flag signal set.

Fixes: 42fbca91cd ("mptcp: add support for port based endpoint")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-30 14:30:24 +00:00
David Ahern e1e089d1f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:48:28 +00:00
Jethro Beekman d56dcd3549 ip: Add nodst option to macvlan type source
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.

This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.

Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:45:59 +00:00
David Ahern e5f1505e53 Merge branch 'rdma-resource-tracking' into next
Leon Romanovsky  says:

====================

This is the user space part of already accepted to the kernel series
that extends RDMA netlink interface to return uverbs context and SRQ
information.

The accepted kernel series can be seen here:
https://lore.kernel.org/linux-rdma/20210422133459.GA2390260@nvidia.com/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:32 +00:00
Neta Ostrovsky 9b272e138d rdma: Add SRQ resource tracking information
Sample output:

$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon

$ rdma res show srq lqpn 126-141
dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon

$ rdma res show srq lqpn 127
dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon

Reviewed-by: Ido Kalir <idok@nvidia.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:37:16 +00:00
Neta Ostrovsky 4278941285 rdma: Add context resource tracking information
Sample output:

$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong

Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:59 +00:00
Neta Ostrovsky 4c61b5b9df rdma: Update uapi headers
Update rdma_netlink.h file upto kernel commit c6c11ad3ab9f
("RDMA/nldev: Add QP numbers to SRQ information")

Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:36:21 +00:00
David Ahern a5ea744ca2 Update kernel headers
Update kernel headers to commit:
    99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-28 15:35:30 +00:00
Stephen Hemminger 2363bc99f9 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next
Required manual fix of devlink/devlink.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 19:39:39 -07:00
Stephen Hemminger 1fdea28051 v5.12.0 2021-04-27 11:59:09 -07:00
Stephen Hemminger a3fb3fcb7d remove trailing whitespace
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-27 11:55:53 -07:00
Andrea Claudi e1ad689545 lib: bpf_legacy: fix missing socket close when connect() fails
In functions bpf_{send,recv}_map_fds(), when connect fails after a
socket is successfully opened, we return with error missing a close on
the socket.

Fix this closing the socket if opened and using a single return point
for both the functions.

Fixes: 6256f8c9e4 ("tc, bpf: finalize eBPF support for cls and act front-end")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 92af24c907 lib: bpf_legacy: treat 0 as a valid file descriptor
As stated in the man page(), open returns a non-negative integer as a
file descriptor. Hence, when checking for its return value to be ok, we
should include 0 as a valid value.

This fixes a covscan warning about a missing close() in this function.

Fixes: ecb05c0f99 ("bpf: improve error reporting around tail calls")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 932fe3453f tc: e_bpf: fix memory leak in parse_bpf()
envp_run is dinamically allocated with a malloc, and not freed in the
out: return path. This commit fix it.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:05:19 -07:00
Andrea Claudi 38ef5bb7b4 ip: netns: fix missing netns close on some error paths
In functions netns_pids() and netns_identify_pid(), the netns file is
not closed on some error paths.

Fix this using a conditional close and a single return point on both
functions.

Fixes: 44b563269e ("ip-nexthop: support flush by id")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-26 21:04:02 -07:00
Nikolay Aleksandrov c72de3713d bridge: vlan: dump port only if there are any vlans
When I added support for new vlan rtm dumping, I made a mistake in the
output format when there are no vlans on the port. This patch fixes it by
not printing ports without vlan entries (similar to current situation).

Example (no vlans):
$ bridge -d vlan show
port              vlan-id

Fixes: e5f87c8341 ("bridge: vlan: add support for the new rtm dump call")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:32:46 +00:00
Tony Ambardar e705b19d48 ip: drop 2-char command assumption
The 'ip' utility hardcodes the assumption of being a 2-char command, where
any follow-on characters are passed as an argument:

  $ ./ip-full help
  Object "-full" is unknown, try "ip help".

This confusing behaviour isn't seen with 'tc' for example, and was added in
a 2005 commit without documentation. It was noticed during testing of 'ip'
variants built/packaged with different feature sets (e.g. w/o BPF support).

Mitigate the problem by redoing the command without the 2-char assumption
if the follow-on characters fail to parse as a valid command.

Fixes: 351efcde4e ("Update header files to 2.6.14")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-26 02:29:42 +00:00
Stephen Hemminger b5a6ed9cc9 uapi: add missing virtio related headers
The build of iproute2 relies on having correct copy of santized
kernel headers. The vdpa utility introduced a dependency on
the vdpa related headers, but these headers were not present
in iproute2 repo.

Fixes: c2ecc82b9d ("vdpa: Add vdpa tool")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-23 10:36:17 -07:00
Andrea Claudi 81bfd01a4c lib: move get_task_name() from rdma
The function get_task_name() is used to get the name of a process from
its pid, and its implementation is similar to ip/iptuntap.c:pid_name().

Move it to lib/fs.c to use a single implementation and make it easily
reusable.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:22:16 +00:00
David Ahern 75a35d50a3 Merge branch 'bridge-vlan' into next
Nikolay Aleksandrov  says:

====================

From: Nikolay Aleksandrov <nikolay@nvidia.com>

This set extends the bridge vlan code to use the new vlan RTM calls
which allow to dump detailed per-port, per-vlan information and also to
manipulate the per-vlan options. It also allows to monitor any vlan
changes (add/del/option change). The rtm vlan dumps have an extensible
format which allows us to add new options and attributes easily, and
also to request the kernel to filter on different vlan information when
dumping. The new kernel dump code tries to use compressed vlan format as
much as possible (it includes netlink attributes for vlan start and
end) to reduce the number of generated messages and netlink traffic.
The iproute2 support is activated by using the "-d" flag when showing
vlan information, that will cause it to use the new rtm dump call and
get all the detailed information, if "-s" is also specified it will dump
per-vlan statistics as well. Obviously in that case the vlans cannot be
compressed. To change per-vlan options (currently only STP state is
supported) a new vlan command is added - "set". It can be used to set
options of bridge or port vlans and vlan ranges can be used, all of the
new vlan option code uses extack to show more understandable errors.
The set adds the first supported per-vlan option - STP state.
Man pages and usage information are updated accordingly.

Example:
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state forwarding
 bridge            1 PVID Egress Untagged
                     state forwarding

 $ bridge vlan set vid 1 dev ens13 state blocking
 $ bridge -d vlan show
 port              vlan-id
 ens13             1 PVID Egress Untagged
                     state blocking
 bridge            1 PVID Egress Untagged
                     state forwarding

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:20:13 +00:00
Nikolay Aleksandrov c311404780 bridge: monitor: add support for vlan monitoring
Add support for vlan activity monitoring, we display vlan notifications on
vlan add/del/options change. The man page and help are also updated
accordingly.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:39 +00:00
Nikolay Aleksandrov e5f87c8341 bridge: vlan: add support for the new rtm dump call
Use the new bridge vlan rtm dump helper to dump all of the available
vlan information when -details (-d) is used with vlan show. It is also
capable of dumping vlan stats if -statistics (-s) is added.
Currently this is the only interface capable of dumping per-vlan
options. The vlan dump format is compatible with current vlan show, it
uses the same helpers to dump vlan information. The new addition is one
line which will contain the per-vlan options (similar to ip -d link show
for ports). Currently only the vlan STP state is printed.
The call uses compressed vlan format by default.

Example:
$ bridge -s -d vlan show
port              vlan-id
virbr1            1 PVID Egress Untagged
                    state forwarding

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:34 +00:00
Nikolay Aleksandrov 34c14bea22 libnetlink: add bridge vlan dump request helper
Add rtnl bridge vlan dump request helper which will be used to retrieve
bridge vlan information and options.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:29 +00:00
Nikolay Aleksandrov 04e2783d5e bridge: vlan: add option set command and state option
Add a new per-vlan option set command. It allows to manipulate vlan
options, those can be bridge-wide or per-port depending on what device
is specified. The first option that can be set is the vlan STP state,
it is identical to the bridge port STP state. The man page is also
updated accordingly.

Example:
 $ bridge vlan set vid 10 dev br0 state learning
or a range:
 $ bridge vlan set vid 10-20 dev swp1 state blocking

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:24 +00:00
Nikolay Aleksandrov f2f52fcabe bridge: add parse_stp_state helper
Add a helper which parses an STP state string to its numeric value.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:20 +00:00
Nikolay Aleksandrov f07516c3b0 bridge: rename and export print_portstate
Rename print_portstate to print_stp_state in preparation for use by vlan
code as well (per-vlan state), and export it. To be in line with the new
naming rename also port_states to stp_states as they'll be used for
vlans, too.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:13:09 +00:00
Florian Westphal ff619e4fd3 mptcp: add support for event monitoring
This adds iproute2 support for mptcp event monitoring, e.g. creation,
establishment, address announcements from the peer, subflow establishment
and so on.

While the kernel-generated events are primarily aimed at mptcpd (e.g. for
subflow management), this is also useful for debugging.

This adds print support for the existing events.

Sample output of 'ip mptcp monitor':
[       CREATED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[   ESTABLISHED] token=83f3a692 remid=0 locid=0 saddr4=10.0.1.2 daddr4=10.0.1.1 sport=58710 dport=10011
[SF_ESTABLISHED] token=83f3a692 remid=0 locid=1 saddr4=10.0.2.2 daddr4=10.0.1.1 sport=40195 dport=10011 backup=0
[        CLOSED] token=83f3a692

Signed-off-by: Florian Westphal <fw@strlen.de>
2021-04-22 05:10:25 +00:00
David Ahern 98040c2dc1 Update kernel headers
Update kernel headers to commit:
    5d869070569a ("net: phy: marvell: don't use empty switch default case")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-22 05:09:39 +00:00
Andrea Claudi c8216fabe8 rdma: stat: fix return code
libmnl defines MNL_CB_OK as 1 and MNL_CB_ERROR as -1. rdma uses these
return codes, and stat_qp_show_parse_cb() should do the same.

Fixes: 16ce4d2366 ("rdma: stat: initialize ret in stat_qp_show_parse_cb()")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-20 18:08:38 -07:00
Andrea Claudi 16ce4d2366 rdma: stat: initialize ret in stat_qp_show_parse_cb()
In the unlikely case in which the mnl_attr_for_each_nested() cycle is
not executed, this function return an uninitialized value.

Fix this initializing ret to 0.

Fixes: 5937552b42 ("rdma: Add "stat qp show" support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6a2c51da99 nexthop: fix memory leak in add_nh_group_attr()
grps is dinamically allocated with a calloc, and not freed in a return
path in the for cycle. This commit fix it.

While at it, make the function use a single return point.

Fixes: 63df8e8543 ("Add support for nexthop objects")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6801ae8273 q_cake: remove useless check on argv
In cake_parse_opt(), *argv is checked not to be null when parsing for
overhead and mpu parameters. However this is useless, since *argv
matches right before for "overhead" or "mpu".

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Andrea Claudi 6b8fa2ea2d devlink: always check strslashrsplit() return value
strslashrsplit() return value is not checked in __dl_argv_handle(),
despite the fact that it can return EINVAL.

This commit fix it and make __dl_argv_handle() return error if
strslashrsplit() return an error code.

Fixes: 2f85a9c535 ("devlink: allow to parse both devlink and port handle in the same time")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:16:55 -07:00
Stephen Hemminger cc718c191b uapi: update can.h
Upstream commit to force packing on ARM OABI

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-13 19:14:34 -07:00
Stephen Hemminger 06d0bbf1ee erspan: fix JSON output
The format for erspan/erspan6 output is not valid JSON, as on version 2 a
valueless key was presented. The direction should be value and erspan_dir
should be the key.

Fixes: 2897636267 ("erspan: add erspan version II support")
Cc: u9012063@gmail.com
Reported-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Christian Pössinger <christian@poessinger.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-04-10 09:52:48 -07:00
Chunmei Xu 44b563269e ip-nexthop: support flush by id
since id is unique for nexthop, it is heavy to dump all nexthops.
use existing delete_nexthop to support flush by id

Signed-off-by: Chunmei Xu <xuchunmei@linux.alibaba.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-08 15:38:58 +00:00
Hoang Le 2021028306 tipc: use the libmnl functions in lib/mnl_utils.c
To avoid code duplication, tipc should be converted to use the helper
functions for working with libmnl in lib/mnl_utils.c

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-04-03 01:10:54 +00:00
Stephen Hemminger e77a0d3dc9 uapi: bpf.h update from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-30 16:38:05 -07:00
Baowen Zheng cf9ae1bd31 police: add support for packet-per-second rate limiting
Allow a policer action to enforce a rate-limit based on packets-per-second,
configurable using a packet-per-second rate and burst parameters.

e.g.
 # $TC actions add action police pkts_rate 1000 pkts_burst 200 index 1
 # $TC actions ls action police
 total acts 1

	action order 0:  police 0x1 rate 0bit burst 0b mtu 4096Mb pkts_rate 1000 pkts_burst 200
	ref 1 bind 0

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:50 +00:00
Cooper Lees 16430e9afd Add Open/R to rt_protos
- Open Routing is using ID 99 for it's installed routes
- https://github.com/facebook/openr
- Kernel has accepted 99 in `rtnetlink.h`

Signed-of-by: Cooper Lees <me@cooperlees.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-30 03:04:09 +00:00
Petr Machata 7384c15e0e ip: Fix batch processing
After the comment cited below, batch mode neglects to set the global
variable batch_mode to a non-zero value. Netns and VRF commands use this
variable, and break in batch mode. Fix by setting the value again.

Fixes: 1d9a81b8c9 ("Unify batch processing across tools")
Reported-by: Tim Rice <trice@posteo.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-22 16:30:21 -07:00
David Ahern 76bfc185f2 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:16:01 +00:00
Sabrina Dubroca 3c75135835 ip: xfrm: add support for tfcpad
This patch adds support for setting and displaying the Traffic Flow
Confidentiality attribute for an XFRM state, which allows padding ESP
packets to a specified length.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-21 17:15:07 +00:00
Stephen Hemminger 872689d431 uapi: minor header update for l2tp
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:36:07 -07:00
Stephen Hemminger 87d6d395d1 README: remove doc instructions
The out of date documentation was removed in 2017, but the instructions
in the README were not removed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-20 09:29:02 -07:00
David Ahern fa505da84b Merge branch 'nexthop-resilient-hash' into next
Petr Machata  says:

====================

Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.

In this patch set, introduce support for resilient next-hop groups to
iproute2.

- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.

- Patches #2 and #3 add new helpers that will be useful later.

- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
  command line argument, and to dispatch based on the specified type.

- Patch #5 adds the support for resilient next-hop groups.

- Patch #6 adds the support for resilient next-hop group bucket interface.

To illustrate the usage, consider the following commands:

 # ip nexthop add id 1 via 192.0.2.2 dev dummy1
 # ip nexthop add id 2 via 192.0.2.3 dev dummy1
 # ip nexthop add id 10 group 1/2 type resilient \
	buckets 8 idle_timer 60 unbalanced_timer 300

The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.

And this is how the next-hop group bucket interface looks:

 # ip nexthop bucket show id 10
 id 10 index 0 idle_time 5.59 nhid 1
 id 10 index 1 idle_time 5.59 nhid 1
 id 10 index 2 idle_time 8.74 nhid 2
 id 10 index 3 idle_time 8.74 nhid 2
 id 10 index 4 idle_time 8.74 nhid 1
 id 10 index 5 idle_time 8.74 nhid 1
 id 10 index 6 idle_time 8.74 nhid 1
 id 10 index 7 idle_time 8.74 nhid 1

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:05:29 +00:00
Ido Schimmel 2be6d18b30 nexthop: Add support for nexthop buckets
Add ability to dump multiple nexthop buckets and get a specific one.
Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop
 id 1 via 192.0.2.2 dev dummy10 scope link
 id 2 via 192.0.2.19 dev dummy20 scope link
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
 # ip nexthop bucket
 id 10 index 0 idle_time 28.1 nhid 2
 id 10 index 1 idle_time 28.1 nhid 2
 id 10 index 2 idle_time 28.1 nhid 2
 id 10 index 3 idle_time 28.1 nhid 2
 id 10 index 4 idle_time 28.1 nhid 1
 id 10 index 5 idle_time 28.1 nhid 1
 id 10 index 6 idle_time 28.1 nhid 1
 id 10 index 7 idle_time 28.1 nhid 1
 # ip nexthop bucket show nhid 1
 id 10 index 4 idle_time 53.59 nhid 1
 id 10 index 5 idle_time 53.59 nhid 1
 id 10 index 6 idle_time 53.59 nhid 1
 id 10 index 7 idle_time 53.59 nhid 1
 # ip nexthop bucket get id 10 index 5
 id 10 index 5 idle_time 81 nhid 1
 # ip -j -p nexthop bucket get id 10 index 5
 [ {
         "id": 10,
         "bucket": {
             "index": 5,
             "idle_time": 104.89,
             "nhid": 1
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:25 +00:00
Ido Schimmel 9167671822 nexthop: Add support for resilient nexthop groups
Add ability to configure resilient nexthop groups and show their current
configuration. Example:

 # ip nexthop add id 10 group 1/2 type resilient buckets 8
 # ip nexthop show id 10
 id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0
 # ip -j -p nexthop show id 10
 [ {
         "id": 10,
         "group": [ {
                 "id": 1
             },{
                 "id": 2
             } ],
         "type": "resilient",
         "resilient_args": {
             "buckets": 8,
             "idle_timer": 120,
             "unbalanced_timer": 0
         },
         "flags": [ ]
     } ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:01:18 +00:00
Ido Schimmel b82d6b81fa nexthop: Add ability to specify group type
Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.

These two commands are equivalent:

 # ip nexthop add id 10 group 1/2/3
 # ip nexthop add id 10 group 1/2/3 type mpath

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:49 +00:00
Petr Machata 28fb925d8b nexthop: Extract a helper to parse a NH ID
NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:43 +00:00
Petr Machata e757f741e9 json_print: Add print_tv()
Add a helper to dump a timeval. Print by first converting to double and
then dispatching to print_color_float().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 15:00:08 +00:00
David Ahern a5b355c08c Update kernel headers
Update kernel headers to commit:
    38cb57602369 ("selftests: net: forwarding: Fix a typo")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-19 14:59:17 +00:00
Stephen Hemminger 6639fce430 ip: cleanup help message text
Wrap help message text at 80 characters, and put list of things
in alpha order.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-18 11:24:06 -07:00
Tony Ambardar 06bee37c1c lib/bpf: add missing limits.h includes
Several functions in bpf_glue.c and bpf_libbpf.c rely on PATH_MAX, which is
normally included from <limits.h> in other iproute2 source files.

It fixes errors seen using gcc 10.2.0, binutils 2.35.1 and musl 1.1.24:

bpf_glue.c: In function 'get_libbpf_version':
bpf_glue.c:46:11: error: 'PATH_MAX' undeclared (first use in this function);
did you mean 'AF_MAX'?
   46 |  char buf[PATH_MAX], *s;
      |           ^~~~~~~~
      |           AF_MAX

Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:53 -07:00
Sabrina Dubroca 6050055387 ip: xfrm: limit the length of the security context name when printing
Security context names are not guaranteed to be NUL-terminated by the
kernel, so we can't just print them using %s directly. The length of
the string is determined by sctx->ctx_len, so we can use that to limit
what fprintf outputs.

While at it, factor that out to a separate function, since the exact
same code is used to print the security context for both policies and
states.

Fixes: b2bb289a57 ("xfrm security context support")
Reported-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-16 22:53:28 -07:00
David Ahern 27ca8989c1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-15 15:08:01 +00:00
Toke Høiland-Jørgensen 60204c81e4 q_cake: Fix incorrect printing of signed values in class statistics
The deficit returned from the kernel is signed, but was printed with a %u
specifier in the format string, leading to negative values to be printed as
high unsigned values instead. In addition, we passed a negative value to
sprint_time() even though that expects an unsigned value. Fix this by
changing the format specifier and reversing the sign of negative time
values.

Fixes: 714444c0cb ("Add support for CAKE qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-08 19:05:19 -08:00
Roi Dayan 9f366536ed dcb: Fix compilation warning about reallocarray
In older distros we need bsd/stdlib.h but newer distro doesn't
need it. Also old distro will need libbsd-devel installed and newer
doesn't. To remove a possible dependency on libbsd-devel replace usage
of reallocarray to realloc.

dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did you mean ‘realloc’?

Fixes: 8e9bed1493 ("dcb: Add a subtool for the DCB APP object")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:56:39 -08:00
Luca Boccassi 6739068fb0 iproute: fix printing resolved localhost
format_host_rta_r might return a cached hostname
via its return value and not use the input buffer.

Before:

$ ip -resolve -6 route
 dev lo proto kernel metric 256 pref medium

After:

$ ip/ip -resolve -6 route
localhost dev lo proto kernel metric 256 pref medium

Bug-Debian: https://bugs.debian.org/983591

Reported-by: Axel Scheepers <axel.scheepers76@gmail.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-03-03 18:54:16 -08:00
Parav Pandit c54e7bd605 devlink: Add error print when unknown values specified
When user specifies either unknown flavour or unknown state during
devlink port commands, return appropriate error message.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:16 +00:00
Parav Pandit 62ff25e51b devlink: Use generic socket helpers from library
User generic socket helpers from library for netlink generic socket
access.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:10 +00:00
Parav Pandit e3a4067e52 utils: Introduce helper routines for generic socket recv
Introduce helper for generic socket receive helper and introduce helper
to build command with custom family and version.

Use API in subsequent devlink patch.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 04:00:04 +00:00
Parav Pandit 03662000e4 devlink: Use library provided string processing APIs
User helper routines provided by library for counting slash and
splitting string on delimiter.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-03 03:59:58 +00:00
Paolo Abeni 42fbca91cd mptcp: add support for port based endpoint
The feature is supported by the kernel since 5.11-net-next,
let's allow user-space to use it.

Just parse and dump an additional, per endpoint, u16 attribute

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:15:10 +00:00
David Ahern 455c9f5361 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-03-01 00:07:57 +00:00
Stephen Hemminger 9d00602f82 vdpa: add .gitignore
Ignore the resulting binary vdpa.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:12:14 -08:00
Stephen Hemminger 5e0e73c347 Update kernel headers from 5.12-pre rc
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-23 23:11:12 -08:00
Stephen Hemminger 52c5f3f043 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2021-02-23 23:03:42 -08:00
Stephen Hemminger bbddfcec6c v5.11.0 2021-02-23 09:34:11 -08:00
Andrea Claudi b2d44b9a95 lib/fs: Fix single return points for get_cgroup2_*
Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.

get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Fixes: 8f1cd119b3 ("lib: fix checking of returned file handle size for cgroup")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi 1de363b180 lib/fs: avoid double call to mkdir on make_path()
make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.

This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.

Fixes: ac3415f5c1 ("lib/fs: Fix and simplify make_path()")
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:20:44 -08:00
Andrea Claudi d4fcdbbec9 lib/bpf: Fix and simplify bpf_mnt_check_target()
As stated in commit ac3415f5c1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.

As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.

Fixes: 95ae9a4870 ("bpf: fix mnt path when from env")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
2021-02-22 18:19:01 -08:00
Andrea Claudi 1e25de9a92 lib/namespace: fix ip -all netns return code
When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.

$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
.  ..

This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.

Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()

Fixes: e998e118dd ("lib: Exec func on each netns")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:17:56 -08:00
Andrea Claudi e833dbe140 ip: lwtunnel: seg6: bail out if table ids are invalid
When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.

Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.

Fixes: 0486388a87 ("add support for table name in SRv6 End.DT* behaviors")
Fixes: 69629b4e43 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:48 -08:00
Andrea Claudi 546f738220 tc: m_gate: use SPRINT_BUF when needed
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.

Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.

Fixes: 07d5ee70b5 ("iproute2-next:tc:action: add a gate control action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 18:11:03 -08:00
Vladimir Oltean e1d79d49ed man8/bridge.8: be explicit that "flood" is an egress setting
Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 14f528a556 man8/bridge.8: explain self vs master for "bridge fdb add"
The "usually hardware" and "usually software" distinctions make no
sense, try to clarify what these do based on the actual kernel behavior.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean b64ceb687d man8/bridge.8: fix which one of self/master is default for "bridge fdb"
The bridge program does:

fdb_modify:
	/* Assume self */
	if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
		req.ndm.ndm_flags |= NTF_SELF;

which is clearly against the documented behavior. The only thing we can
do, sadly, is update the documentation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 10130bfafe man8/bridge.8: explain what a local FDB entry is
Explaining the "local" flag by saying that it is "a local permanent fdb
entry" is not very helpful, be more specific.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean ae3cb3d34d man8/bridge.8: document that "local" is default for "bridge fdb add"
The bridge does this:

fdb_modify:
	/* Assume permanent */
	if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
		req.ndm.ndm_state |= NUD_PERMANENT;

So let's make the user aware of the fact that if they don't want local
entries, they need to specify some other flag like "static".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Vladimir Oltean 1261459c64 man8/bridge.8: document the "permanent" flag for "bridge fdb add"
The bridge program parses "local" and "permanent" in just the same way,
so it makes sense to tell that to users:

fdb_modify:
		} else if (matches(*argv, "local") == 0 ||
			   matches(*argv, "permanent") == 0) {
			req.ndm.ndm_state |= NUD_PERMANENT;

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 11:19:38 -08:00
Ido Kalir 675e2df632 rdma: Fix statistics bind/unbing argument handling
The dump isn't supported for the statistics bind/unbind commands
because they operate on specific QP counters. This is different
from query commands that can operate on many objects at the same
time.

Let's check the user input and ensure that arguments are valid.

Fixes: a6d0773ebe ("rdma: Add stat manual mode support")
Signed-off-by: Ido Kalir <idok@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-22 10:52:39 -08:00
Thayne McCombs c7897ec2a6 ss: Make leading ":" always optional for sport and dport
The sport and dport conditions in expressions were inconsistent on
whether there should be a ":" at the beginning of the port when only a
port was provided depending on the family. The link and netlink
families required a ":" to work. The vsock family required the ":"
to be absent. The inet and inet6 families work with or without a leading
":".

This makes the leading ":" optional in all cases, so if sport or dport
are used, then it works with a leading ":" or without one, as inet and
inet6 did.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-14 22:09:37 -07:00
Amit Cohen 33e2471e8f ip route: Print "rt_offload_failed" indication
The kernel signals when offload fails using the 'RTM_F_OFFLOAD_FAILED'
flag. Print it to help users understand the offload state of the route.
The "rt_" prefix is used in order to distinguish it from the offload state
of nexthops, similar to "rt_offload" and "rt_trap".

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:50:15 -07:00
David Ahern 34de4b26bf Update kernel headers
Update kernel headers to commit:
    c4762993129f ("Merge branch 'skbuff-introduce-skbuff_heads-bulking-and-reusing'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-13 17:48:05 -07:00
Oleksandr Mazur c946f5d3e4 devlink: add support for port params get/set
Add implementation for the port parameters
getting/setting.
Add bash completion for port param.
Add man description for port param.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:21:24 -07:00
David Ahern 143610383d Merge branch 'vdpa' into next
Parav Pandit  says:

====================

Linux vdpa interface allows vdpa device management functionality.
This includes adding, removing, querying vdpa devices.

vdpa interface also includes showing supported management devices
which support such operations.

This patchset includes kernel uapi headers and a vdpa tool.

examples:

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 25=
6

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

An example of PCI PF, VF and SF management device:
pci/0000:03.00:0
  supported_classes
    net
pci/0000:03.00:4
  supported_classes
    net
auxiliary/mlx5_core.sf.8
  supported_classes
    net

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:16:49 -07:00
Parav Pandit c2ecc82b9d vdpa: Add vdpa tool
vdpa tool is created to create, delete and query vdpa devices.
examples:
Show vdpa management device that supports creating, deleting vdpa devices.

$ vdpa mgmtdev show
vdpasim:
  supported_classes net

$ vdpa mgmtdev show -jp
{
    "show": {
        "vdpasim": {
            "supported_classes": [ "net" ]
        }
    }
}

Create a vdpa device of type networking named as "foo2" from
the management device vdpasim_net:

$ vdpa dev add mgmtdev vdpasim_net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:15 -07:00
Parav Pandit 6c76994982 utils: Add helper to map string to unsigned int
In subsequent patch need to map a string to a unsigned int.
Hence, add an API to map a string to unsigned int.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:10 -07:00
Parav Pandit b822275ad8 utils: Add generic socket helpers
Subsequent patch needs to
(a) query and use socket family
(b) send/receive messages using this family

Hence add helper routines to open, close, query family and to perform
send receive operations.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:09:07 -07:00
Parav Pandit bd3709c3a7 utils: Add helper routines for indent handling
Subsequent patch needs to use 2 char indentation for nested objects.
Hence introduce a generic helpers to allocate, deallocate, increment,
decrement and to print indent block.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:08:13 -07:00
Parav Pandit 5a6bf92a95 Add kernel headers
Add kernel headers to commit from kernel tree [1].
  6acba4951632 ("vdpa_sim_net: Add support for user supported devices")

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-11 09:07:47 -07:00
Paul Blakey 049708a002 tc: flower: Add support for ct_state reply flag
Matches on conntrack rpl ct_state.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est+rpl \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est-rpl \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:28 -07:00
Maxim Mikityanskiy b8b8b6d4c9 tc/htb: Hierarchical QoS hardware offload
This commit adds support for configuring HTB in offload mode. HTB
offload eliminates the single qdisc lock in the datapath and offloads
the algorithm to the NIC. The new 'offload' parameter is added to
enable this mode:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Classes are created as usual, but filters should be moved to clsact for
lock-free classification (filters attached to HTB itself are not
supported in the offload mode):

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

tc qdisc show and tc class show will indicate whether the offload is
enabled. Example output:

$ tc qdisc show dev eth1
qdisc htb 1: root offloaded r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000 offload
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
qdisc pfifo 0: parent 1: limit 1000p
$ tc class show dev eth1
class htb 1:101 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:1 root rate 100Gbit ceil 100Gbit burst 0b cburst 0b  offload
class htb 1:103 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:102 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:105 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:104 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:107 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:106 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
class htb 1:108 parent 1:1 prio 0 rate 4Gbit ceil 4Gbit burst 1000b cburst 1000b  offload
$ tc -j qdisc show dev eth1
[{"kind":"htb","handle":"1:","root":true,"offloaded":true,"options":{"r2q":10,"default":"0","direct_packets_stat":0,"direct_qlen":1000,"offload":null}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}},{"kind":"pfifo","handle":"0:","parent":"1:","options":{"limit":1000}}]

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:54:13 -07:00
Thayne McCombs b7e5002456 ss: always prefer family as part of host condition to default family
ss accepts an address family both with the -f option and as part of a
host condition. However, if the family in the host condition is
different than the the last -f option, then which family is actually
used depends on the order that different families are checked.

This changes parse_hostcond to check all family prefixes before parsing
the rest of the address, so that the host condition's family always has
a higher priority than the "preferred" family.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-04 21:48:16 -07:00
Stephen Hemminger 2741208502 uapi: pick up rpl.h fix
Upstream change to fix byte order issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-03 08:16:16 -08:00
Luca Boccassi 5a37254b71 iproute: force rtm_dst_len to 32/128
Since NETLINK_GET_STRICT_CHK was enabled, the kernel rejects commands
that pass a prefix length, eg:

 ip route get `1.0.0.0/1
  Error: ipv4: Invalid values in header for route get request.
 ip route get 0.0.0.0/0
  Error: ipv4: rtm_src_len and rtm_dst_len must be 32 for IPv4

Since there's no point in setting a rtm_dst_len that we know is going
to be rejected, just force it to the right value if it's passed on
the command line. Print a warning to stderr to notify users.

Bug-Debian: https://bugs.debian.org/944730
Reported-By: Clément 'wxcafé' Hertling <wxcafe@wxcafe.net>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:32:47 -08:00
Thayne McCombs 38957a2f6c ss: Add clarification about host conditions with multiple familes to man
In creating documentation for expressions I ran into an interesting case
where if you use two different familie types in the expression, such as
in `ss 'sport inet:ssh or src unix:/run/*'`, then you would only get the
results for one address family (in this case unix sockets).

The reason is that in parse_hostcond if the family is specified we
remove any previously added families from filter->families, and
preserve the "states" if any states are set. I tried changing this to
not reset the families, but ran into some issues with Invalid Argument
errors in inet_show_netlink, I think related to the state.

I can dig into that more if supporting this is useful, but I'm not sure
if these types of expressions would actually be useful in practice. Or
perhaps an error should be given if an expression contains conditions
with multiple families (besides inet and inet6)?

Anyway, for now, this patch just notes the limitation in the man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:30:40 -08:00
Thayne McCombs df361a27c2 Add documentation of ss filter to man page
This adds some documentation of the syntax for the FILTER argument to
the ss command to the ss (8) man page.

Signed-off-by: Thayne McCombs <astrothayne@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:24:03 -08:00
Edwin Peer 9764761888 iplink: print warning for missing VF data
The kernel might truncate VF info in IFLA_VFINFO_LIST. Compare the
expected number of VFs in IFLA_NUM_VF to how many were found in the
list and warn accordingly.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:18:42 -08:00
Paolo Abeni 3d6d9e6e67 ss: do not emit warn while dumping MPTCP on old kernels
Prior to this commit, running 'ss' on a kernel older than v5.9
bumps an error message:

RTNETLINK answers: Invalid argument

When asked to dump protocol number > 255 - that is: MPTCP - 'ss'
adds an INET_DIAG_REQ_PROTOCOL attribute, unsupported by the older
kernel.

Avoid the warning ignoring filter issues when INET_DIAG_REQ_PROTOCOL
is used.

Additionally older kernel end-up invoking tcpdiag_send(), which
in turn will try to dump DCCP socks. Bail early in such function,
as the kernel does not implement an MPTCPDIAG_GET request.

Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:17:14 -08:00
Vladimir Oltean 4712a46174 man: tc-taprio.8: document the full offload feature
Since this feature's introduction in commit 9c66d1564676 ("taprio: Add
support for hardware offloading") from kernel v5.4, it never got
documented in the man pages. Due to this reason, we see customer reports
of seemingly contradictory information: the community manpages claim
there is no support for full offload, nonetheless many silicon vendors
have already implemented it.

This patch documents the full offload feature (enabled by specifying
"flags 2" to the taprio qdisc) and gives one more example that tries to
illustrate some of the finer points related to the usage.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:12:27 -08:00
Guillaume Nault 86d9660dc1 iplink_bareudp: cleanup help message and man page
* Fix PROTO description in help message (mpls isn't a valid argument).

 * Remove SRCPORTMIN description from help message since it doesn't
   appear in the syntax string.

 * Use same keywords in help message and in man page.

 * Use the "ethertype" option name (.B ethertype) rather than the
   option value (.I ETHERTYPE) in the man page description of
   [no]multiproto.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-02-02 14:11:32 -08:00
David Ahern d10f2a4bd8 Merge branch 'devlink-port-mgmt' into next
Parav Pandit  says:

====================

This patchset implements devlink port add, delete and function state
management commands.

An example sequence for a PCI SF:

Set the device in switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

View ports in switchdev mode:
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 s=
plittable false

Add a subfunction port for PCI PF 0 with sfnumber 88:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfn=
um 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Show a newly added port:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf contro=
ller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:8=
8 state active

Show the port in JSON format:
$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Set the function state to active:
$ devlink port function set pci/0000:06:00.0/32768 state inactive

Delete the port after use:
$ devlink port del pci/0000:06:00.0/32768

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:45:49 +00:00
Parav Pandit bdfb9f1bd6 devlink: Support set of port function state
Support set operation of the devlink port function state.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:48 +00:00
Parav Pandit 249465d3bf devlink: Support get port function state
Print port function state and operational state whenever reported by
kernel.

Example of a PCI SF port function which supports the state:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:41 +00:00
Parav Pandit 331bf89ad0 devlink: Supporting add and delete of devlink port
Enable user to add and delete the devlink port.

Examples for adding and deleting one SF port:

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

Add devlink port of flavour 'pcipf' for PF number 0 SF number 88:

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

Delete newly added devlink port
$ devlink port del pci/0000:06:00.0/32768

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:36 +00:00
Parav Pandit 836a1365b7 devlink: Introduce PCI SF port flavour and attribute
Introduce PCI SF port flavour and port attributes such as PF
number and SF number.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:06:30 +00:00
Parav Pandit a9642c5fa6 devlink: Introduce and use string to number mapper
Instead of using static mapping in code, introduce a helper routine to
map a value to string.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 02:01:53 +00:00
David Ahern 1e61902180 Update kernel headers
Update kernel headers to commit:
    14e8e0f60088 ("tcp: shrink inet_connection_sock icsk_mtup enabled and probe_size")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-02-02 01:58:51 +00:00
Oliver Hartkopp 2ce313d1bb iplink_can: add Classical CAN frame LEN8_DLC support
The len8_dlc element is filled by the CAN interface driver and used for CAN
frame creation by the CAN driver when the CAN_CTRLMODE_CC_LEN8_DLC flag is
supported by the driver and enabled via netlink configuration interface.

Add the command line support for cc-len8-dlc for Linux 5.11+

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-29 15:49:23 +00:00
Jarod Wilson 7887500008 bond: support xmit_hash_policy=vlan+srcmac
There's a new transmit hash policy being added to the bonding driver that
is a simple XOR of vlan ID and source MAC, xmit_hash_policy vlan+srcmac.
This trivial patch makes it configurable and queryable via iproute2.

$ sudo modprobe bonding mode=2 max_bonds=1 xmit_hash_policy=0

$ sudo ip link set bond0 type bond xmit_hash_policy vlan+srcmac

$ ip -d link show bond0
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any
primary_reselect always fail_over_mac none xmit_hash_policy vlan+srcmac resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs
65535

$ grep Hash /proc/net/bonding/bond0
Transmit Hash Policy: vlan+srcmac (5)

$ sudo ip link add test type bond help
Usage: ... bond [ mode BONDMODE ] [ active_slave SLAVE_DEV ]
                [ clear_active_slave ] [ miimon MIIMON ]
                [ updelay UPDELAY ] [ downdelay DOWNDELAY ]
                [ peer_notify_delay DELAY ]
                [ use_carrier USE_CARRIER ]
                [ arp_interval ARP_INTERVAL ]
                [ arp_validate ARP_VALIDATE ]
                [ arp_all_targets ARP_ALL_TARGETS ]
                [ arp_ip_target [ ARP_IP_TARGET, ... ] ]
                [ primary SLAVE_DEV ]
                [ primary_reselect PRIMARY_RESELECT ]
                [ fail_over_mac FAIL_OVER_MAC ]
                [ xmit_hash_policy XMIT_HASH_POLICY ]
                [ resend_igmp RESEND_IGMP ]
                [ num_grat_arp|num_unsol_na NUM_GRAT_ARP|NUM_UNSOL_NA ]
                [ all_slaves_active ALL_SLAVES_ACTIVE ]
                [ min_links MIN_LINKS ]
                [ lp_interval LP_INTERVAL ]
                [ packets_per_slave PACKETS_PER_SLAVE ]
                [ tlb_dynamic_lb TLB_DYNAMIC_LB ]
                [ lacp_rate LACP_RATE ]
                [ ad_select AD_SELECT ]
                [ ad_user_port_key PORTKEY ]
                [ ad_actor_sys_prio SYSPRIO ]
                [ ad_actor_system LLADDR ]

BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb
ARP_VALIDATE := none|active|backup|all
ARP_ALL_TARGETS := any|all
PRIMARY_RESELECT := always|better|failure
FAIL_OVER_MAC := none|active|follow
XMIT_HASH_POLICY := layer2|layer2+3|layer3+4|encap2+3|encap3+4|vlan+srcmac
LACP_RATE := slow|fast
AD_SELECT := stable|bandwidth|count

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:33:15 +00:00
wenxu c94fd71b34 tc: flower: add tc conntrack inv ct_state support
Matches on conntrack inv ct_state.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:16:35 +00:00
David Ahern c81a173f6b Update kernel headers
Update kernel headers to commit:
    59a49d9617e2 ("Merge branch 'mlxsw-expose-number-of-physical-ports'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-23 18:15:57 +00:00
Luca Boccassi 8498ca92d7 vrf: fix ip vrf exec with libbpf
The size of bpf_insn is passed to bpf_load_program instead of the number
of elements as it expects, so ip vrf exec fails with:

$ sudo ip link add vrf-blue type vrf table 10
$ sudo ip link set dev vrf-blue up
$ sudo ip/ip vrf exec vrf-blue ls
Failed to load BPF prog: 'Invalid argument'
last insn is not an exit or jmp
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Kernel compiled with CGROUP_BPF enabled?

https://bugs.debian.org/980046

Reported-by: Emmanuel DECAEN <Emmanuel.Decaen@xsalto.com>

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:17 -08:00
Luca Boccassi 8dca565b17 vrf: print BPF log buffer if bpf_program_load fails
Necessary to understand what is going on when bpf_program_load fails

Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:32:11 -08:00
Roi Dayan 1a22ad2721 build: Fix link errors on some systems
Since moving get_rate() and get_size() from tc to lib, on some
systems we fail to link because of missing math lib.
Move the functions that require math lib to their own c file
and add -lm to dcb that now use those functions.

../lib/libutil.a(utils.o): In function `get_rate':
utils.c:(.text+0x10dc): undefined reference to `floor'
../lib/libutil.a(utils.o): In function `get_size':
utils.c:(.text+0x1394): undefined reference to `floor'
../lib/libutil.a(json_print.o): In function `sprint_size':
json_print.c:(.text+0x14c0): undefined reference to `rint'
json_print.c:(.text+0x14f4): undefined reference to `rint'
json_print.c:(.text+0x157c): undefined reference to `rint'

Fixes: f3be0e6366 ("lib: Move get_rate(), get_rate64() from tc here")
Fixes: 44396bdfcc ("lib: Move get_size() from tc here")
Fixes: adbe5de966 ("lib: Move sprint_size() from tc here, add print_size()")

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-18 12:28:47 -08:00
David Ahern b553cffa9f Merge branch 'dcb-app-dcbx' into next
Petr Machata  says:

====================

Add support to the dcb tool for the following two DCB objects:

- APP, which allows configuration of traffic prioritization rules based on
  several possible packet headers.

- DCBX, which is a 1-byte bitfield of flags that configure whether the DCBX
  protocol is implemented in the device or in the host, and which version
  of the protocol should be used.

Patch #1 adds a new helper for finding a name of a given dsfield value.
This is useful for APP DSCP-to-priority rules, which can use human-readable
DSCP names.

Patches #2, #3 and #4 extend existing interfaces for, respectively, parsing
of the X:Y mappings, for setting a DCB object, and for getting a DCB
object.

In patch #5, support for the command line argument -N / --Numeric is
added. The APP tool later uses it to decide whether to format DSCP values
as human-readable strings or as plain numbers.

Patches #6 and #7 add the subtools themselves and their man pages.

v2:
- Two patches dropped and sent to iproute2 branch as "dcb: Fixes".
  This patch set now depends on that one.
- Patch #5:
    - Make it -N / --Numeric instead of -n / --no-nice-names
    - Rename the flag from no_nice_names to numeric as well
- Patch #6:
    - Adjust to s/no_nice_names/numeric/ from another patch.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:10:27 +00:00
Petr Machata 89d11ea596 dcb: Add a subtool for the DCBX object
The Linux DCBX object is a 1-byte bitfield of flags that configure whether
the DCBX protocol is implemented in the device or in the host, and which
version of the protocol should be used. Add a tool to access the per-port
Linux DCBX object.

For example:

	# dcb dcbx set dev eni1np1 host ieee
	# dcb dcbx show dev eni1np1
	host ieee

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 8e9bed1493 dcb: Add a subtool for the DCB APP object
DCB APP interfaces are standardized in 802.1q-2018, and allow configuration
of traffic prioritization rules based on several possible headers.

Add a dcb subtool for maintenance and display of the APP table. For
example:

    # dcb app add dev eni1np1 dscp-prio 0:0 CS3:3 CS6:6
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS6:6
    # dcb app add dev eni1np1 dscp-prio CS3:4
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:3 CS3:4 CS6:6
    # dcb app replace dev eni1np1 dscp-prio CS3:5
    # dcb app show dev eni1np1
    dscp-prio 0:0 CS3:5 CS6:6

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 0aebd32b82 dcb: Support -N to suppress translation to human-readable names
Some DSCP values can be translated to symbolic names. That may not be
always desirable. Introduce a command-line option similar to other tools,
-N or --Numeric, to suppress this translation.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata e59876ff55 dcb: Generalize dcb_get_attribute()
The function dcb_get_attribute() assumes that the caller knows the exact
size of the looked-for payload. It also assumes that the response comes
wrapped in an DCB_ATTR_IEEE nest. The former assumption does not hold for
the IEEE APP table, which has variable size. The latter one does not hold
for DCBX, which is not IEEE-nested, and also for any CEE attributes, which
would come CEE-nested.

Factor out the payload extractor from the current dcb_get_attribute() code,
and put into a helper. Then rewrite dcb_get_attribute() compatibly in terms
of the new function. Introduce dcb_get_attribute_va() as a thin wrapper for
IEEE-nested access, and dcb_get_attribute_bare() for access to attributes
that are not nested.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata 69290c32dc dcb: Generalize dcb_set_attribute()
The function dcb_set_attribute() takes a fully-formed payload as an
argument. For callers that need to build a nested attribute, such as is the
case for DCB APP table, this is not great, because with libmnl, they would
need to construct a separate netlink message just to pluck out the payload
and hand it over to this function.

Currently, dcb_set_attribute() also always wraps the payload in an
DCB_ATTR_IEEE container, because that is what all the dcb subtools so far
needed. But that is not appropriate for DCBX in particular, and in fact a
handful other attributes, as well as any CEE payloads.

Instead, generalize this code by adding parameters for constructing a
custom payload and for fetching the response from a custom response
attribute. Then add dcb_set_attribute_va(), which takes a callback to
invoke in the right place for the nest to be built, and
dcb_set_attribute_bare(), which is similar to dcb_set_attribute(), but does
not encapsulate the payload in an IEEE container. Rewrite
dcb_set_attribute() compatibly in terms of the new functions.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata c13216f7a6 lib: Generalize parse_mapping()
The function parse_mapping() assumes the key is a number, with a single
configurable exception, which is using "all" to mean "all possible keys".
If a caller wishes to use symbolic names instead of numbers, they cannot
reuse this function.

To facilitate reuse in these situations, convert parse_mapping() into a
helper, parse_mapping_gen(), which instead of an allow-all boolean takes a
generic key-parsing callback. Rewrite parse_mapping() in terms of this
newly-added helper and add a pair of key parsers, one for just numbers,
another for numbers and the keyword "all". Publish the latter as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
Petr Machata bf244ee677 lib: rt_names: Add rtnl_dsfield_get_name()
For formatting DSCP (not full dsfield), it would be handy to be able to
just get the name from the name table, and not get any of the remaining
cruft related to formatting. Add a new entry point to just fetch the
name table string uninterpreted. Use it from rtnl_dsfield_n2a().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 04:09:29 +00:00
David Ahern fa2881b664 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-18 03:57:29 +00:00
Guillaume Nault 676a1a708f tc: flower: fix json output with mpls lse
The json output of the TCA_FLOWER_KEY_MPLS_OPTS attribute was invalid.

Example:

  $ tc filter add dev eth0 ingress protocol mpls_uc flower mpls \
      lse depth 1 label 100                                     \
      lse depth 2 label 200

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "  mpls":["    lse":["depth":1,"label":100],
                  "    lse":["depth":2,"label":200]]}...

This is invalid as the arrays, introduced by "[", can't contain raw
string:value pairs. Those must be enclosed into "{}" to form valid json
ojects. Also, there are spurious whitespaces before the mpls and lse
strings because of the indentation used for normal output.

Fix this by putting all LSE parameters (depth, label, tc, bos and ttl)
into the same json object. The "mpls" key now directly contains a list
of such objects.

Also, handle strings differently for normal and json output, so that
json strings don't get spurious indentation whitespaces.

Normal output isn't modified.
The json output now looks like:

  $ tc -json filter show dev eth0 ingress
    ...{"eth_type":"8847",
        "mpls":[{"depth":1,"label":100},
                {"depth":2,"label":200}]}...

Fixes: eb09a15c12 ("tc: flower: support multiple MPLS LSE match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:13:36 -08:00
Petr Machata 934919b991 dcb: Change --Netns/-N to --netns/-n
This to keep compatible with the major tools, ip and tc. Also
document the option in the man page, which was neglected.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata b4c0cad06e dcb: Plug a leaking DCB socket buffer
DCB socket buffer is allocated in dcb_init(), but never freed(). Free it
in dcb_fini().

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Petr Machata 2e99c28161 dcb: Set values with RTM_SETDCB type
dcb currently sends all netlink messages with a type RTM_GETDCB, even the
set ones. Change to the appropriate type.

Fixes: 67033d1c1c ("Add skeleton of a new tool, dcb")
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:12:15 -08:00
Stephen Hemminger 8b4b132261 uapi: update if_link.h from upstream
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:35 -08:00
Petr Machata ffe58c9185 include: uapi: Carry dcbnl.h
To allow building a new suite of DCB tools on an older kernel, carry a copy
of dcbnl.h.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-01-16 09:09:28 -08:00
Patrisious Haddad 537995c6d5 rdma: Add support for the netlink extack
Add support in rdma for extack errors to be received
in userspace when sent from kernel, so now netlink extack
error messages sent from kernel would be printed for the
user.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:18:42 +00:00
Ido Schimmel 9bd498bfcd ipmonitor: Mention "nexthop" object in help and man page
Before:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid
 FILE := file FILENAME

After:

 # ip monitor help
 Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ] [ label ] [all-nsid] [dev DEVICE]
 LISTofOBJECTS := link | address | route | mroute | prefix |
                  neigh | netconf | rule | nsid | nexthop
 FILE := file FILENAME

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:17:32 +00:00
Ido Schimmel 043e03a369 nexthop: Fix usage output
Before:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get| del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
       [ encap ENCAPTYPE ENCAPHDR ] | group GROUP ] }
 GROUP := [ id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

After:

 # ip nexthop help
 Usage: ip nexthop { list | flush } [ protocol ID ] SELECTOR
        ip nexthop { add | replace } id ID NH [ protocol ID ]
        ip nexthop { get | del } id ID
 SELECTOR := [ id ID ] [ dev DEV ] [ vrf NAME ] [ master DEV ]
             [ groups ] [ fdb ]
 NH := { blackhole | [ via ADDRESS ] [ dev DEV ] [ onlink ]
         [ encap ENCAPTYPE ENCAPHDR ] | group GROUP [ fdb ] }
 GROUP := [ <id[,weight]>/<id[,weight]>/... ]
 ENCAPTYPE := [ mpls ]
 ENCAPHDR := [ MPLSLABEL ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2021-01-10 17:14:08 +00:00
Stephen Hemminger 2953235e61 uapi: update kernel headers to 5.11 pre rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-24 19:38:35 -08:00
Stephen Hemminger 2639bdc176 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next into main 2020-12-24 19:29:15 -08:00
Stephen Hemminger c9c64b8d1e 5.10.0 2020-12-21 10:28:53 -08:00
Guillaume Nault cb0debfe2d testsuite: Add mpls packet matching tests for tc flower
Match all MPLS fields using smallest and highest possible values.
Test the two ways of specifying MPLS header matching:

  * with the basic mpls_{label,tc,bos,ttl} keywords (match only on the
    first LSE),

  * with the more generic "lse" keyword (allows matching at different
    depth of the MPLS label stack).

This test file allows to find problems like the one fixed by
Linux commit 7fdd375e3830 ("net: sched: Fix dump of MPLS_OPT_LSE_LABEL
attribute in cls_flower").

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:14:26 +00:00
David Ahern c01dec8475 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:06:06 +00:00
Thomas Karlsson 42f5642a40 iplink:macvlan: Added bcqueuelen parameter
This patch allows the user to set and retrieve the
IFLA_MACVLAN_BC_QUEUE_LEN parameter via the bcqueuelen
command line argument

This parameter controls the requested size of the queue for
broadcast and multicast packages in the macvlan driver.

If not specified, the driver default (1000) will be used.

Note: The request is per macvlan but the actually used queue
length per port is the maximum of any request to any macvlan
connected to the same port.

For this reason, the used queue length IFLA_MACVLAN_BC_QUEUE_LEN_USED
is also retrieved and displayed in order to aid in the understanding
of the setting. However, it can of course not be directly set.

Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-16 04:02:07 +00:00
Andrea Claudi c8faeca5ad ss: mptcp: fix add_addr_accepted stat print
add_addr_accepted value is not printed if add_addr_signal value is 0.
Fix this properly looking for add_addr_accepted value, instead.

Fixes: 9c3be2c0ee ("ss: mptcp: add msk diag interface support")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-15 13:59:13 -08:00
Andrea Claudi 0d78e8eabf tc: pedit: fix memory leak in print_pedit
keys_ex is dinamically allocated with calloc on line 770, but
is not freed in case of error at line 823.

Fixes: 081d6c310d ("tc: pedit: Support JSON dumping")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:24:08 -08:00
Andrea Claudi ec1346acbe devlink: fix memory leak in cmd_dev_flash()
nlg_ntf is dinamically allocated in mnlg_socket_open(), and is freed on
the out: return path. However, some error paths do not free it,
resulting in memory leak.

This commit fix this using mnlg_socket_close(), and reporting the
correct error number when required.

Fixes: 9b13cddfe2 ("devlink: implement flash status monitoring")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:23:24 -08:00
Andrea Claudi 309e6027e5 man: tc-flower: fix manpage
Commit 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
add man page for tc-ct, but it brings with it a bogus block of text
in the benning of tc-flower man page.

This commit simply removes it.

Fixes: 924c43778a ("man: tc-ct.8: Add manual page for ct tc action")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 09:22:53 -08:00
David Ahern ee50fd58dc Merge branch 'dcb-pfc-buffer-maxrate' into next
Petr Machata  says:
====================

Add support to the dcb tool for the following three DCB objects:

- PFC, for "Priority-based Flow Control", allows configuration of priority
  lossiness, and related toggles.

- DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
  allow configuration of port headroom buffers.

- DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces
  and allow configuration of rate with which traffic in a given traffic
  class is sent.

Patches #1-#4 fix small issues in the current DCB code and man pages.

Patch #5 adds new helpers to the DCB dispatcher.

Patches #6 and #7 add support for command line arguments -s and -i. These
enable, respectively, display of statistical counters, and ISO/IEC mode of
rate units.

Patches #8-#10 add the subtools themselves and their man pages.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:48:38 +00:00
Petr Machata 117939d9bd dcb: Add a subtool for the DCB maxrate object
DCBNL maxrate interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of rate with which traffic in a given traffic class is
sent.

Add a dcb subtool to allow showing and tweaking of this per-TC maximum
rate. For example:

    # dcb maxrate show dev eni1np1
    tc-maxrate 0:25Gbit 1:25Gbit 2:25Gbit 3:25Gbit 4:25Gbit 5:25Gbit 6:100Gbit 7:25Gbit

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:07 +00:00
Petr Machata 2e36f91000 dcb: Add a subtool for the DCB buffer object
DCBNL buffer interfaces are an extension to the 802.1q DCB interfaces and
allow configuration of port headroom buffers.

Add a dcb subtool to allow showing and tweaking of buffer priority mapping
and buffer sizes. For example:

    # dcb buf show dev eni1np1
    prio-buffer 0:0 1:0 2:0 3:3 4:0 5:0 6:6 7:0
    buffer-size 0:10000 1:0 2:0 3:70000 4:0 5:0 6:10000 7:0
    total-size 221072

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:42:03 +00:00
Petr Machata 6567cb588b dcb: Add a subtool for the DCB PFC object
PFC, for "Priority-based Flow Control", allows configuration of priority
lossiness, and related toggles.

Add a dcb subtool to allow showing and tweaking of individual PFC
configuration options, and querying statistics. For example:

    # dcb pfc show dev eni1np1
    pfc-cap 8 macsec-bypass on delay 0
    pg-pfc 0:off 1:on 2:off 3:off 4:off 5:off 6:off 7:on
    requests 0:0 1:217 2:0 3:0 4:0 5:0 6:0 7:28
    indications 0:0 1:179 2:0 3:0 4:0 5:0 6:0 7:18

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:58 +00:00
Petr Machata 808dd741fc dcb: Add -i to enable IEC mode
Allow switching "dcb" into the ISO/IEC mode of units by passing -i.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:54 +00:00
Petr Machata 6e9687db04 dcb: Add -s to enable statistics
Allow selective display of statistical counters by passing -s.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:50 +00:00
Petr Machata 11a72186a0 dcb: Add dcb_set_u32(), dcb_set_u64()
The DCB buffer object has a settable array of 32-bit quantities, and the
maxrate object of 64-bit ones. Adjust dcb_parse_mapping() and related
helpers to support 64-bit values in mappings, and add appropriate helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:45 +00:00
Petr Machata 7e94711c71 man: dcb-ets: Remove an unnecessary empty line
Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:40 +00:00
Petr Machata a7c2eaac39 dcb: ets: Change the way show parameters are given in synopsis
None, one, or many parameters can be given on the command line, but
the current synopsis allows only none or one. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:22 +00:00
Petr Machata 12d41d0184 dcb: ets: Fix help display for "show" subcommand
"dcb ets show dev X help" currently shows full "ets" help instead of just
help for the show command. Fix it.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:41:19 +00:00
Petr Machata 7fe954ee34 dcb: Remove unsupported command line arguments from getopt_long()
getopt_long() currently includes "c" and "n" in the short option string.
These probably slipped in as a cut'n'paste, and are not actually accepted.
Remove them.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-14 16:40:32 +00:00
Stephen Hemminger 376367d917 uapi: merge in change to bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-14 08:07:06 -08:00
David Ahern 6e9bfdcdde Merge branch 'devlink-reload' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:43:41 +00:00
Moshe Shemesh c2d7c45c32 devlink: Add reload stats to dev show
Show reload statistics through devlink dev show using devlink stats
flag. The reload statistics show the history per reload action type and
limit. Add remote reload statistics to show the history of actions
performed due devlink reload commands initiated by remote host.

Output examples:
$ devlink dev show -s
pci/0000:82:00.0:
  stats:
      reload:
          driver_reinit:
            unspecified 2
          fw_activate:
            unspecified 1 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
pci/0000:82:00.1:
  stats:
      reload:
          driver_reinit:
            unspecified 0
          fw_activate:
            unspecified 0 no_reset 0
      remote_reload:
          driver_reinit:
            unspecified 1
          fw_activate:
            unspecified 1 no_reset 0

$ devlink dev show -s -jp
{
    "dev": {
        "pci/0000:82:00.0": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 2
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                }
            }
        },
        "pci/0000:82:00.1": {
            "stats": {
                "reload": {
                    "driver_reinit": {
                        "unspecified": 0
                    },
                    "fw_activate": {
                        "unspecified": 0,
                        "no_reset": 0
                    }
                },
                "remote_reload": {
                    "driver_reinit": {
                        "unspecified": 1
                    },
                    "fw_activate": {
                        "unspecified": 1,
                        "no_reset": 0
                    }
                }
            }
        }
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:15 +00:00
Moshe Shemesh 0c0023ad71 devlink: Add pr_out_dev() helper function
Add pr_out_dev() helper function and use it both by cmd_dev_show_cb()
and by cmd_mon_show_cb().

Dev stats will be added on the next patch to dev context, so
cmd_mon_show_cb() should print the whole dev context and not just dev
handle.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:42:09 +00:00
Moshe Shemesh f28c910274 devlink: Add devlink reload action and limit options
Add reload action and reload limit to devlink reload command to enable
the user to select the reload action required and constrains limits on
these actions that he may want to ensure.

The following reload actions are supported:
  driver_reinit: driver entities re-initialization, applying
                 devlink-param and devlink-resource values.
  fw_activate: firmware activate.

The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.

By default reload actions are not limited and driver implementation may
include reset or downtime as needed to perform the actions. However, if
reload limit is selected, the driver should perform only if it can do it
while keeping the limit constraints.

Reload limit added:
  no_reset: No reset allowed, no down time allowed, no link flap and no
            configuration is lost.

Command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
  driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
  driver_reinit fw_activate

devlink dev reload pci/0000:82:00.1 action driver_reinit -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit" ]
    }
}

devlink dev reload pci/0000:82:00.0 action fw_activate -jp
{
    "reload": {
        "reload_actions_performed": [ "driver_reinit","fw_activate" ]
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:40:00 +00:00
David Ahern 120cdeb1b7 Merge branch 'rate-size-parsing-output' into next
Petr Machata says:
==================

The DCB tool will have commands that deal with buffer sizes and traffic
rates. TC is another tool that has a number of such commands, and functions
to support them: get_size(), get_rate/64(), s/print_size() and
s/print_rate(). In this patchset, these functions are moved from TC to lib/
for possible reuse and modernized.

s/print_rate() has a hidden parameter of a global variable use_iec, which
made the conversion non-trivial. The parameter was made explicit,
print_rate() converted to a mostly json_print-like function, and
sprint_rate() retired in favor of the new print_rate. Patches #1 and #2
deal with this.

The intention was to treat s/print_size() similarly, but unfortunately two
use cases of sprint_size() cannot be converted to a json_print-like
print_size(), and the function sprint_size() had to remain as a discouraged
backdoor to print_size(). This is done in patch #3.

Patch #4 then improves the code of sprint_size() a little bit.

Patch #5 fixes a buglet in formatting small rates in IEC mode.

Patches #6 and #7 handle a routine movement of, respectively,
get_rate/64() and get_size() from tc to lib.

This patchset does not actually add any new uses of these functions. A
follow-up patchset will add subtools for management of DCB buffer and DCB
maxrate objects that will make use of them.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:32:17 +00:00
Petr Machata 44396bdfcc lib: Move get_size() from tc here
The function get_size() serves for parsing of sizes using a handly notation
that supports units and their prefixes, such as 10Kbit. This will be useful
for the DCB buffer size parsing. Move the function from TC to the general
library, so that it can be reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:50 +00:00
Petr Machata f3be0e6366 lib: Move get_rate(), get_rate64() from tc here
The functions get_rate() and get_rate64() are useful for parsing rate-like
values. The DCB tool will find these useful in the maxrate subtool.
Move them over to lib so that they can be easily reused.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:44 +00:00
Petr Machata aaeda2a768 lib: print_color_rate(): Fix formatting small rates in IEC mode
ISO/IEC units are distinguished from the decadic ones by using a prefixes
like "Ki", "Mi" instead of "K" and "M". The current code inserts the letter
"i" after the decadic unit when in IEC mode. However it does so even when
the prefix is an empty string, formatting 1Kbit in IEC mode as "1000ibit".
Fix by omitting the letter if there is no prefix.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:41 +00:00
Petr Machata a0a4b6618c lib: sprint_size(): Uncrustify the code a bit
Ideally this and the rate printing would both be converted to a common
helper, but unfortunately the two format differently and this would break
tests and scripts out there. So just make the code look less like a wad of
hay.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:36 +00:00
Petr Machata adbe5de966 lib: Move sprint_size() from tc here, add print_size()
When displaying sizes of various sorts, tc commonly uses the function
sprint_size() to format the size into a buffer as a human-readable string.
This string is then displayed either using print_string(), or in some code
even fprintf(). As a result, a typical sequence of code when formatting a
size is something like the following:

	SPRINT_BUF(b);
	print_uint(PRINT_JSON, "foo", NULL, foo);
	print_string(PRINT_FP, NULL, "foo %s ", sprint_size(foo, b));

For a concept as broadly useful as size, it would be better to have a
dedicated function in json_print.

To that end, move sprint_size() from tc_util to json_print. Add helpers
print_size() and print_color_size() that wrap arount sprint_size() and
provide the JSON dispatch as appropriate.

Since print_size() should be the preferred interface, convert vast majority
of uses of sprint_size() to print_size(). Two notable exceptions are:

- q_tbf, which does not show the size as such, but uses the string
  "$human_readable_size/$cell_size" even in JSON. There is simply no way to
  have print_size() emit the same text, because print_size() in JSON mode
  should of course just use the raw number, without human-readable frills.

- q_cake, which relies on the existence of sprint_size() in its macro-based
  formatting helpers. There might be ways to convert this particular case,
  but given q_tbf simply cannot be converted, leave it as is.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:25 +00:00
Petr Machata 60265cc226 lib: Move print_rate() from tc here; modernize
The functions print_rate() and sprint_rate() are useful for formatting
rate-like values. The DCB tool would find these useful in the maxrate
subtool. However, the current interface to these functions uses a global
variable use_iec as a flag indicating whether 1024- or 1000-based powers
should be used when formatting the rate value. For general use, a global
variable is not a great way of passing arguments to a function. Besides, it
is unlike most other printing functions in that it deals in buffers and
ignores JSON.

Therefore make the interface to print_rate() explicit by converting use_iec
to an ordinary parameter. Since the interface changes anyway, convert it to
follow the pattern of other json_print functions (except for the
now-explicit use_iec parameter). Move to json_print.c.

Add a wrapper to tc, so that all the call sites do not need to repeat the
use_iec global variable argument, and convert all call sites.

In q_cake.c, the conversion is not straightforward due to usage of a macro
that is shared across numerous data types. Simply hand-roll the
corresponding code, which seems better than making an extra helper for one
call site.

Drop sprint_rate() now that everybody just uses print_rate().

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:30:15 +00:00
Petr Machata cdd9425315 Move the use_iec declaration to the tools
The tools "ip" and "tc" use a flag "use_iec", which indicates whether, when
formatting rate values, the prefixes "K", "M", etc. should refer to powers
of 1024, or powers of 1000. The flag is currently kept as a global variable
in "ip" and "tc", but is nonetheless declared in util.h.

Instead, move the declaration to tool-specific headers ip/ip_common.h and
tc/tc_common.h.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:28:43 +00:00
Paolo Lungaroni 69629b4e43 seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors
We introduce the "vrftable" attribute for supporting the SRv6 End.DT4 and
End.DT6 behaviors in iproute2.
The "vrftable" attribute indicates the routing table associated with
the VRF device used by SRv6 End.DT4/DT6 for routing IPv4/IPv6 packets.

The SRv6 End.DT4/DT6 is used to implement IPv4/IPv6 L3 VPNs based on Segment
Routing over IPv6 networks in multi-tenants environments.
It decapsulates the received packets and it performs the IPv4/IPv6 routing
lookup in the routing table of the tenant.

The SRv6 End.DT4/DT6 leverages a VRF device in order to force the routing
lookup into the associated routing table using the "vrftable" attribute.

Some examples:
 $ ip -6 route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0
 $ ip -6 route add 2001:db8::2 encap seg6local action End.DT6 vrftable 200 dev eth0

Standard Output:
 $ ip -6 route show 2001:db8::1
 2001:db8::1  encap seg6local action End.DT4 vrftable 100 dev eth0 metric 1024 pref medium

JSON Output:
$ ip -6 -j -p route show 2001:db8::2
[ {
        "dst": "2001:db8::2",
        "encap": "seg6local",
        "action": "End.DT6",
        "vrftable": 200,
        "dev": "eth0",
        "metric": 1024,
        "flags": [ ],
        "pref": "medium"
} ]

v2:
 - no changes made: resubmit after pulling out this patch from the kernel
   patchset.

v1:
 - mixing this patch with the kernel patchset confused patckwork.

Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:27:42 +00:00
David Ahern cfad32569f Update kernel headers
Update kernel headers to commit:
    afae3cc2da10 ("net: atheros: simplify the return expression of atl2_phy_setup_autoneg_adv()")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-09 02:25:34 +00:00
David Ahern 8065d28218 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:25:12 +00:00
David Ahern b3c4a55064 Only compile mnl_utils when HAVE_MNL is defined
New lib/mnl_utils.c fails to compile if libmnl is not installed:

  mnl_utils.c:9:10: fatal error: libmnl/libmnl.h: No such file or directory
      9 | #include <libmnl/libmnl.h>

Make it dependent on HAVE_MNL.

Fixes: 72858c7b77 ("lib: Extract from devlink/mnlg a helper, mnlu_socket_open()")
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-04 16:19:05 +00:00
Stephen Hemminger 2e80ae89ca Merge branch 'gcc-10' into main 2020-12-03 08:33:06 -08:00
Luca Boccassi 755b1c584e tc/mqprio: json-ify output
As reported by a Debian user, mqprio output in json mode is
invalid:

{
     "kind": "mqprio",
     "handle": "8021:",
     "dev": "enp1s0f0",
     "root": true,
     "options": { tc 2 map 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
          queues:(0:3) (4:7)
          mode:channel
          shaper:dcb}
}

json-ify it, while trying to maintain the same formatting
for standard output.

New output:

{
    "kind": "mqprio",
    "handle": "8001:",
    "root": true,
    "options": {
        "tc": 2,
        "map": [ 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
        "queues": [ [ 0, 3 ], [ 4, 7 ] ],
        "mode": "channel",
        "shaper": "dcb"
    }
}

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=972784

Reported-by: Roméo GINON <romeo.ginon@ilexia.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:32:42 -08:00
Luca Boccassi 975c4944e8 ip/netns: use flock when setting up /run/netns
If multiple ip processes are ran at the same time to set up
separate network namespaces, and it is the first time so /run/netns
has to be set up first, and they end up doing it at the same time,
the processes might enter a recursive loop creating thousands of
mount points, which might crash the system depending on resources
available.

Try to take a flock on /run/netns before doing the mount() dance, to
ensure this cannot happen. But do not try too hard, and if it fails
continue after printing a warning, to avoid introducing regressions.

First reported on Debian: https://bugs.debian.org/949235

To reproduce (WARNING: run in a VM to avoid system lockups):

for i in {0..9}
do
        strace -e trace=mount -e inject=mount:delay_exit=1000000 ip \
 netns add "testnetns$i" 2>&1 | tee "$i.log" &
done
wait

The strace is to ensure the problem always reproduces, to add an
artificial synchronization point after the first mount().

Reported-by: Etienne Dechamps <etienne@edechamps.fr>
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-12-03 08:31:23 -08:00
Vlad Buslov ea130da81e tc: implement support for action terse dump
Implement support for action terse dump using new TCA_ACT_FLAG_TERSE_DUMP
value of TCA_ROOT_FLAGS tlv. Set the flag when user requested it with
following example CLI (-br for 'brief'):

$ tc -s -br actions ls action tunnel_key
total acts 2

        action order 0: tunnel_key       index 1
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 1: tunnel_key       index 2
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
action (kind, index) and stats, if requested by the user.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:51:06 +00:00
Vlad Buslov 00fffb2d79 tc: use TCA_ACT_ prefix for action flags
Use TCA_ACT_FLAG_LARGE_DUMP_ON alias according to new preferred naming for
action flags.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:49:14 +00:00
David Ahern 23683dec32 Update kernel headers
Update kernel headers to commit:
    cec85994c6b4 ("bareudp: constify device_type declaration")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-03 03:47:07 +00:00
Sergey Ryazanov d7190d4ced ip: add IP_LIB_DIR environment variable
Do not hardcode /usr/lib/ip as a path and allow libraries path
configuration in run-time.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-12-02 16:37:07 +00:00
Stephen Hemminger fb054cb336 uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:17:22 -08:00
Stephen Hemminger c95d63e4fb uapi: update devlink.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 21:16:50 -08:00
Stephen Hemminger cae2e9291a f_u32: fix compiler gcc-10 compiler warning
With gcc-10 it complains about array subscript error.

f_u32.c: In function ‘u32_parse_opt’:
f_u32.c:1113:24: warning: array subscript 0 is outside the bounds of an interior zero-length array ‘struct tc_u32_key[0]’ [-Wzero-length-bounds]
 1113 |    hash = sel2.sel.keys[0].val & sel2.sel.keys[0].mask;
      |           ~~~~~~~~~~~~~^~~
In file included from tc_util.h:11,
                 from f_u32.c:26:
../include/uapi/linux/pkt_cls.h:253:20: note: while referencing ‘keys’
  253 |  struct tc_u32_key keys[0];
      |

This is because the keys are actually allocated in the second element
of the parent structure.

Simplest way to address the warning is to assign directly to the keys
in the containing structure.

This has always been in iproute2 (pre-git) so no Fixes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:33 -08:00
Stephen Hemminger c014983921 misc: fix compiler warning in ifstat and nstat
The code here was doing strncpy() in a way that causes gcc 10
warning about possible string overflow. Just use strlcpy() which
will null terminate and bound the string as expected.

This has existed since start of git era so no Fixes tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:31 -08:00
Stephen Hemminger 2319db9052 tc: fix compiler warnings in ip6 pedit
Gcc-10 complains about referencing a zero size array.
This occurs because the array of keys is actually in the following
structure which is part of the overall selector.

The original code was safe, but better to just use the key
array directly.

Fixes: 2d9a8dc439 ("tc: p_ip6: Support pedit of IPv6 dsfield")
Cc: petrm@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:23 -08:00
Stephen Hemminger 5bdc4e9151 bridge: fix string length warning
Gcc-10 complains about possible string length overflow.
This can't happen Ethernet address format is always limited to
18 characters or less. Just resize the temp buffer.

Fixes: 70dfb0b883 ("iplink: bridge: export bridge_id and designated_root")
Cc: nikolay@cumulusnetworks.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:20:16 -08:00
Stephen Hemminger f817699939 devlink: fix uninitialized warning
GCC-10 complains about uninitialized variable.

devlink.c: In function ‘cmd_dev’:
devlink.c:2803:12: warning: ‘val_u32’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 2803 |    val_u16 = val_u32;
      |    ~~~~~~~~^~~~~~~~~
devlink.c:2747:11: note: ‘val_u32’ was declared here
 2747 |  uint32_t val_u32;
      |           ^~~~~~~

This is a false positive because it can't figure out the control flow
when the parse returns error.

Fixes: 2557dca2b0 ("devlink: Add string to uint{8,16,32} conversion for generic parameters")
Cc: shalomt@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-29 16:19:36 -08:00
Vladimir Oltean c29f65db34 bridge: add support for L2 multicast groups
Extend the 'bridge mdb' command for the following syntax:
bridge mdb add dev br0 port swp0 grp 01:02:03:04:05:06 permanent

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:54:02 +00:00
Luca Boccassi f5c1246e6a Add dcb/.gitignore
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-29 20:39:47 +00:00
David Ahern f98ce50046 Merge branch 'libbpf' into next
Hangbin Liu  says:

====================

This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.

This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. LIBBPF_FORCE can be set to on to force configure
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency), or set off to disable libbpf check
and build iproute2 with legacy bpf.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.

The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).

At last, Thanks a lot for Toke's help on this patch set.

v6:
a) print runtime libbpf version in ip -V and tc -V

v5:
a) Fix LIBBPF_DIR typo and description, use libbpf DESTDIR as LIBBPF_DIR
   dest.
b) Fix bpf_prog_load_dev typo.
c) rebase to latest iproute2-next.

v4:
a) Make variable LIBBPF_FORCE able to control whether build iproute2
   with libbpf or not.
b) Add new file bpf_glue.c to for libbpf/legacy mixed bpf calls.
c) Fix some build issues and shell compatibility error.

v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
   libbpf-devel-0.1.0-1.fc33

v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.

Here are the test results with patched iproute2:
== Show libbpf version
$ ip -V
ip utility, iproute2-5.9.0, libbpf 0.1.0
$ tc -V
tc utility, iproute2-5.9.0, libbpf 0.1.0

== setup env
$ clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
$ clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
$ clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
$ rm -rf /sys/fs/bpf/xdp/globals
$ /root/iproute2/ip/ip link add type veth
$ /root/iproute2/ip/ip link set veth0 up
$ /root/iproute2/ip/ip link set veth1 up

== Load objs
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 4 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
4: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:21-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 5
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 8 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
8: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:23-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 10
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 12 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
12: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:25-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 15
$ /root/iproute2/ip/ip link set veth0 xdp off

== Load objs again to make sure maps could be reused
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 16 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
16: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:27-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 20
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 20 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show                                                                                                                                                                   [236/4518]
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
20: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:29-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 25
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 24 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
$ bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
24: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:31-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 30
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals

== Testing if we can load new-style objects (using xdp-filter as an example)
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_all.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 28 tag e29eeda1489a6520 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
28: xdp  name xdpfilt_alw_all  tag e29eeda1489a6520  gpl
        loaded_at 2020-10-22T08:04:33-0400  uid 0
        xlated 2408B  jited 1405B  memlock 4096B  map_ids 9,5,7,8,6
        btf_id 35
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_ip.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag 2f2b9dbfb786a5a2 jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
32: xdp  name xdpfilt_alw_ip  tag 2f2b9dbfb786a5a2  gpl
        loaded_at 2020-10-22T08:04:35-0400  uid 0
        xlated 1336B  jited 778B  memlock 4096B  map_ids 7,8,5
        btf_id 40
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_tcp.o sec xdp_filter
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 36 tag 18c1bb25084030bc jited
$ ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
$ bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
$ bpftool prog show
36: xdp  name xdpfilt_alw_tcp  tag 18c1bb25084030bc  gpl
        loaded_at 2020-10-22T08:04:37-0400  uid 0
        xlated 1128B  jited 690B  memlock 4096B  map_ids 6,5
        btf_id 45
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Load new btf defined maps
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_graft.o sec aaa
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 40 tag 3056d2382e53f27c jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
40: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:39-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 50
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_map_in_map.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 44 tag 4420e72b2a601ed7 jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
44: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:41-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 13
        btf_id 55
$ /root/iproute2/ip/ip link set veth0 xdp off
$ /root/iproute2/ip/ip link set veth0 xdp obj btf_shared.o sec ingress
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 48 tag 9cbab549c3af3eab jited
$ ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer  map_sh
$ bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
14: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
48: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:43-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 14
        btf_id 60
$ /root/iproute2/ip/ip link set veth0 xdp off
$ rm -rf /sys/fs/bpf/xdp/globals

== Test load objs by tc
$ /root/iproute2/tc/tc qdisc add dev veth0 ingress
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_cyclic.o sec 0xabccba/0
$ /root/iproute2/tc/tc filter add dev veth0 parent ffff: bpf obj bpf_graft.o
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/1
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 43/0
$ /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec classifier
$ /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
$ ls /sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d /sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d:
jmp_tc

/sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f:
jmp_ex  jmp_tc  map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc
$ bpftool map show
15: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
16: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
17: prog_array  name jmp_ex  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
18: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 2  memlock 4096B
        owner_prog_type sched_cls  owner jited
19: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
$ bpftool prog show
52: sched_cls  name cls_loop  tag 3e98a40b04099d36  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 168B  jited 133B  memlock 4096B  map_ids 15
        btf_id 65
56: sched_cls  name cls_entry  tag 0fbb4d9310a6ee26  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 144B  jited 121B  memlock 4096B  map_ids 16
        btf_id 70
60: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 75
66: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 80
72: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 85
78: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 90
79: sched_cls  name cls_case2  tag ee218ff893dca823  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 336B  jited 218B  memlock 4096B  map_ids 19,18
        btf_id 90
80: sched_cls  name cls_exit  tag e78a58140deed387  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 288B  jited 177B  memlock 4096B  map_ids 19
        btf_id 90

I also run the following upstream kselftest with patches iproute2 and
all passed.

test_lwt_ip_encap.sh
test_xdp_redirect.sh
test_tc_redirect.sh
test_xdp_meta.sh
test_xdp_veth.sh
test_xdp_vlan.sh

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:24:15 -07:00
Hangbin Liu 71c7c1fb4f examples/bpf: add bpf examples with BTF defined maps
Users should try use the new BTF defined maps instead of struct
bpf_elf_map defined maps. The tail call examples are not added yet
as libbpf doesn't currently support declaratively populating tail call
maps.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:08 -07:00
Hangbin Liu 1ac8285a69 examples/bpf: move struct bpf_elf_map defined maps to legacy folder
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:06 -07:00
Hangbin Liu 6d61a2b557 lib: add libbpf support
This patch converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available, which is started by Toke's
implementation[1]. With libbpf iproute2 could correctly process BTF
information and support the new-style BTF-defined maps, while keeping
compatibility with the old internal map definition syntax.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code in bpf_legacy.c ensures that
iproute2 will still understand the old map definition format, including
populating map-in-map and tail call maps before load.

In bpf_libbpf.c, we init iproute2 ctx and elf info first to check the
legacy bytes. When handling the legacy maps, for map-in-maps, we create
them manually and re-use the fd as they are associated with id/inner_id.
For pin maps, we only set the pin path and let libbp load to handle it.
For tail calls, we find it first and update the element after prog load.

Other maps/progs will be loaded by libbpf directly.

[1] https://lore.kernel.org/bpf/20190820114706.18546-1-toke@redhat.com/

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:05 -07:00
Hangbin Liu dc800a4ed4 lib: make ipvrf able to use libbpf and fix function name conflicts
There are directly calls in libbpf for bpf program load/attach.
So we could just use two wrapper functions for ipvrf and convert
them with libbpf support.

Function bpf_prog_load() is removed as it's conflict with libbpf
function name.

bpf.c is moved to bpf_legacy.c for later main libbpf support in
iproute2.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:04 -07:00
Hangbin Liu 503e9229b0 iproute2: add check_libbpf() and get_libbpf_version()
This patch aim to add basic checking functions for later iproute2
libbpf support.

First we add check_libbpf() in configure to see if we have bpf library
support. By default the system libbpf will be used, but static linking
against a custom libbpf version can be achieved by passing libbpf DESTDIR
to variable LIBBPF_DIR for configure.

Another variable LIBBPF_FORCE is used to control whether to build iproute2
with libbpf. If set to on, then force to build with libbpf and exit if
not available. If set to off, then force to not build with libbpf.

When dynamically linking against libbpf, we can't be sure that the
version we discovered at compile time is actually the one we are
using at runtime. This can lead to hard-to-debug errors. So we add
a new file lib/bpf_glue.c and a helper function get_libbpf_version()
to get correct libbpf version at runtime.

Signed-off-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:14:02 -07:00
David Ahern ee5d4b24e3 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 22:04:48 -07:00
Roi Dayan ed40b7e2ae tc flower: fix parsing vlan_id and vlan_prio
When protocol is vlan then eth_type is set to the vlan eth type.
So when parsing vlan_id and vlan_prio need to check tc_proto
is vlan and not eth_type.

Fixes: 4c551369e0 ("tc flower: use right ethertype in icmp/arp parsing")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:45:20 -07:00
Petr Machata ca5ec9a17a ip: iptuntap: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:41 -07:00
Petr Machata 66e574c4c5 ip: ipnetconf: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:34 -07:00
Petr Machata 07d82b4a79 ip: iplink_bridge_slave: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.
Note that _print_onoff() has an extra parameter for a JSON-specific flag
name. However that argument is not used, and never was. Therefore when
moving over to print_on_off(), drop this argument.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:30 -07:00
Petr Machata 3e0d2a73ba ip: iplink_bridge_slave: Port over to parse_on_off()
Invoke parse_on_off() from bridge_slave_parse_on_off() instead of
hand-rolling one. Exit on failure, because the invarg that was ivoked here
before would.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:27 -07:00
Petr Machata 5f685d064b ip: iplink: Convert to use parse_on_off()
Invoke parse_on_off() instead of rolling a custom function.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:23 -07:00
Petr Machata 94d12fd796 bridge: link: Convert to use print_on_off()
Instead of rolling a custom on-off printer, use the one added to utils.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:19 -07:00
Petr Machata 9262ccc3ed bridge: link: Port over to parse_on_off()
Convert bridge/link.c from a custom on_off parser to the new global one.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-24 21:43:14 -07:00
David Ahern e1ae6efbb8 Merge branch 'nexthop-flags' into next
Ido Schimmel  says:

====================

From: Ido Schimmel <idosch@nvidia.com>

Patch #1 prints the recently added 'RTNH_F_TRAP' flag.

Patch #2 makes sure that nexthop flags are always printed for nexthop
objects. Even when the nexthop does not have a device, such as a
blackhole nexthop or a group.

Example output with netdevsim:

$ ip nexthop
id 1 via 192.0.2.2 dev eth0 scope link trap
id 2 blackhole trap
id 3 group 2 trap

Example output with mlxsw:

$ ip nexthop
id 1 via 192.0.2.2 dev swp3 scope link offload
id 2 blackhole offload
id 3 group 2 offload

Tested with fib_nexthops.sh that uses "ip nexthop" output:

Tests passed: 164
Tests failed:   0

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:46:30 -07:00
Ido Schimmel 0788678991 nexthop: Always print nexthop flags
Currently, the nexthop flags are only printed when the nexthop has a
nexthop device. The offload / trap indication is therefore not printed
for nexthop groups.

Instead, always print the nexthop flags, regardless if the nexthop has a
nexthop device or not.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:43:56 -07:00
Ido Schimmel 3de35f41be ip route: Print "trap" nexthop indication
The kernel can now signal that a nexthop is trapping packets instead of
forwarding them. Print the flag to help users understand the offload
state of each nexthop.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:42:20 -07:00
David Ahern db8b149b16 Update kernel headers
Update kernel headers to commit:
    f9e425e99b07 ("octeontx2-af: Add support for RSS hashing based on Transport protocol field")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-22 12:41:23 -07:00
Stephen Hemminger 7a49ff9d79 bridge: report correct version
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-15 08:58:52 -08:00
Zahari Doychev 4c551369e0 tc flower: use right ethertype in icmp/arp parsing
Currently the icmp and arp parsing functions are called with incorrect
ethtype in case of vlan or cvlan filter options. In this case either
cvlan_ethtype or vlan_ethtype has to be used. The ethtype is now updated
each time a vlan ethtype is matched during parsing.

Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 20:07:38 -07:00
David Ahern 1ed00380b0 Merge branch 'dcb-tool' into next
Petr Machata  says:
====================

The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc.

Currently a common libre tool for configuration of DCB is OpenLLDP. This
suite contains a daemon that uses Linux DCB interface to configure HW
according to the DCB TLVs exchanged over an interface. The daemon can also
be controlled by a client, through which the user can adjust and view the
configuration. The downside of using OpenLLDP is that it is somewhat
heavyweight and difficult to use in scripts, and does not support
extensions such as buffer and rate commands.

For access to many HW features, one would be perfectly fine with a
fire-and-forget tool along the lines of "ip" or "tc". For scripting in
particular, this would be ideal. This author is aware of one such tool,
mlnx_qos from Mellanox OFED scripts collection[1].

The downside here is that the tool is very verbose, the command line
language is awkward to use, it is not packaged in Linux distros, and
generally has the appearance of a very vendor-specific tool, despite not
being one.

This patchset addresses the above issues by providing a seed of a clean,
well-documented, easily usable, extensible fire-and-forget tool for DCB
configuration:

    # dcb ets set dev eni1np1 \
                  tc-tsa all:strict 0:ets 1:ets 2:ets \
		  tc-bw all:0 0:33 1:33 2:34

    # dcb ets show dev eni1np1 tc-tsa tc-bw
    tc-tsa 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict
    tc-bw 0:33 1:33 2:34 3:0 4:0 5:0 6:0 7:0

    # dcb ets set dev eni1np1 tc-bw 1:30 2:37

    # dcb -j ets show dev eni1np1 | jq '.tc_bw[2]'
    37

The patchset proceeds as follows:

- Many tools in iproute2 have an option to work in batch mode, where the
  commands to run are given in a file. The code to handle batching is
  largely the same independent of the tool in question. In patch #1, add a
  helper to handle the batching, and migrate individual tools to use it.

- A number of configuration options come in a form of an on-off switch.
  This in turn can be considered a special case of parsing one of a given
  set of strings. In patch #2, extract helpers to parse one of a number of
  strings, on top of which build an on-off parser.

  Currently each tool open-codes the logic to parse the on-off toggle. A
  future patch set will migrate instances of this code over to the new
  helpers.

- The on/off toggles from previous list item sometimes need to be dumped.
  While in the FP output, one typically wishes to maintain consistency with
  the command line and show actual strings, "on" and "off", in JSON output
  one would rather use booleans. This logic is somewhat annoying to have to
  open-code time and again. Therefore in patch #3, add a helper to do just
  that.

- The DCB tool is built on top of libmnl. Several routines will be
  basically the same in DCB as they are currently in devlink. In patches
  #4-#6, extract them to a new module, mnl_utils, for easy reuse.

- Much of DCB is built around arrays. A syntax similar to the iplink_vlan's
  ingress-qos-map / egress-qos-map is very handy for describing changes
  done to such arrays. Therefore in patch #7, extract a helper,
  parse_mapping(), which manages parsing of key-value arrays. In patch #8,
  fix a buglet in the helper, and in patch #9, extend it to allow setting
  of all array elements in one go.

- In patch #10, add a skeleton of "dcb", which contains common helpers and
  dispatches to subtools for handling of individual objects. The skeleton
  is empty as of this patch.

  In patch #11, add "dcb_ets", a module for handling of specifically DCB
  ETS objects.

  The intention is to gradually add handlers for at least PFC, APP, peer
  configuration, buffers and rates.

[1] https://github.com/Mellanox/mlnx-tools/tree/master/ofed_scripts

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:48:52 -07:00
Petr Machata ef15b07601 dcb: Add a subtool for the DCB ETS object
ETS, for "Enhanced Transmission Selection", is a set of configurations that
permit configuration of mapping of priorities to traffic classes, traffic
selection algorithm to use per traffic class, bandwidth allocation, etc.

Add a dcb subtool to allow showing and tweaking of individual ETS
configuration options. For example:

    # dcb ets show dev eni1np1
    willing on ets_cap 8 cbs off
    tc-bw 0:0 1:0 2:0 3:0 4:100 5:0 6:0 7:0
    pg-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    tc-tsa 0:strict 1:strict 2:strict 3:strict 4:ets 5:strict 6:strict 7:strict
    prio-tc 0:1 1:3 2:5 3:0 4:0 5:0 6:0 7:0
    reco-tc-bw 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
    reco-tc-tsa 0:strict 1:strict 2:strict 3:strict 4:strict 5:strict 6:strict 7:strict
    reco-prio-tc 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 67033d1c1c Add skeleton of a new tool, dcb
The Linux DCB interface allows configuration of a broad range of
hardware-specific attributes, such as TC scheduling, flow control, per-port
buffer configuration, TC rate, etc. Add a new tool to show that
configuration and tweak it.

DCB allows configuration of several objects, and possibly could expand to
pre-standard CEE interfaces. Therefore the tool itself is a lean shell that
dispatches to subtools each dedicated to one of the objects.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:19 -07:00
Petr Machata 66a2d71487 lib: parse_mapping: Recognize a keyword "all"
The DCB tool will have to provide an interface to a number of fixed-size
arrays. Unlike the egress- and ingress-qos-map, it makes good sense to have
an interface to set all members to the same value. For example to set
strict priority on all TCs besides select few, or to reset allocated
bandwidth to all zeroes, again besides several explicitly-given ones.

To support this usage, extend the parse_mapping() with a boolean that
determines whether this special use is supported. If "all" is given and
recognized, mapping_cb is called with the key of -1.

Have iplink_vlan pass false for allow_all.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata bc3523ae70 lib: parse_mapping: Update argc, argv on error
Currently argc and argv are not updated unless parsing of all of the
mapping was successful. However in that case, "ip link" will point at the
wrong argument when complaining:

    # ip link add name eth0.100 link eth0 type vlan id 100 egress 1:1 2:foo
    Error: argument "1" is wrong: invalid egress-qos-map

Update argc and argv even in the case of parsing error, so that the right
element is indicated.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 28e663ee65 lib: Extract from iplink_vlan a helper to parse key:value arrays
VLAN netdevices have two similar attributes: ingress-qos-map and
egress-qos-map. These attributes can be configured with a series of
802.1-priority-to-skb-priority (and vice versa) mappings. A reusable helper
along those lines will be handy for configuration of various
priority-to-tc, tc-to-algorithm, and other arrays in DCB.

Therefore extract the logic to a function parse_mapping(), move to utils.c,
and dispatch to utils.c from iplink_vlan.c. That necessitates extraction of
a VLAN-specific parse_qos_mapping(). Do that, and propagate addattr_l()
return value up, unlike the original.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 6dd778e837 lib: Extract from devlink/mnlg a helper, mnlu_socket_recv_run()
Receiving a message in libmnl is a somewhat involved operation. Devlink's
mnlg library has an implementation that is going to be handy for other
tools as well. Extract it into a new helper.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata dd78dfc7be lib: Extract from devlink/mnlg a helper, mnlu_msg_prepare()
Allocation of a new netlink message with the two usual headers is reusable
with other netlink netlink message types. Extract it into a helper,
mnlu_msg_prepare(). Take the second header as an argument, instead of
passing in parameters to initialize it, and copy it in.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 72858c7b77 lib: Extract from devlink/mnlg a helper, mnlu_socket_open()
This little dance of mnl_socket_open(), option setting, and bind, is the
same regardless of tool. Extract into a new module that should hold helpers
for working with libmnl, mnl_util.c.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 9091ff0251 lib: json_print: Add print_on_off()
The value of a number of booleans is shown as "on" and "off" in the plain
output, and as an actual boolean in JSON mode. Add a function that does
that.

RDMA tool already uses a function named print_on_off(). This function
always shows "on" and "off", even in JSON mode. Since there are probably
very few if any consumers of this interface at this point, migrate it to
the new central print_on_off() as well.

Signed-off-by: Petr Machata <me@pmachata.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 82604d2852 lib: Add parse_one_of(), parse_on_off()
Take from the macsec code parse_one_of() and adapt so that it passes the
primary result as the main return value, and error result through a
pointer. That is the simplest way to make the code reusable across data
types without introducing extra magic.

Also from macsec take the specialization of parse_one_of() for parsing
specifically the strings "off" and "on".

Convert the macsec code to the new helpers.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Petr Machata 1d9a81b8c9 Unify batch processing across tools
The code for handling batches is largely the same across iproute2 tools.
Extract a helper to handle the batch, and adjust the tools to dispatch to
this helper. Sandwitch the invocation between prologue / epilogue code
specific for each tool.

Signed-off-by: Petr Machata <me@pmachata.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-11-13 19:43:15 -07:00
Guillaume Nault 8682f588bf tc-mpls: fix manpage example and help message string
Manpage:
 * Remove the extra "and to ip packets" part from command description
   to make it more understandable.

 * Redirect packets to eth1, instead of eth0, as told in the
   description.

Help string:
 * "mpls pop" can be followed by a CONTROL keyword.

 * "mpls modify" can also set the MPLS_BOS field.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:28 -08:00
Guillaume Nault 7c7a0fe0c8 tc-vlan: fix help and error message strings
* "vlan pop" can be followed by a CONTROL keyword.

 * Add missing space in error message.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:49:18 -08:00
Stephen Hemminger 72f88bd42a uapi: update kernel headers from 5.10-rc2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:47:27 -08:00
Stephen Hemminger b90c39be33 rdma: fix spelling error in comment
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:44:19 -08:00
Stephen Hemminger c8424b73e1 man: fix spelling errors
Lots of little typo errors on man pages.
Found by running codespell

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:40:30 -08:00
Stephen Hemminger cbf6481797 tc/m_gate: fix spelling errors
Fix spelling errors in error messages.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-08 10:34:23 -08:00
Stephen Hemminger 14b189f066 uapi: updates from 5.10-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-11-03 08:29:53 -08:00
David Ahern 51f28eb928 Merge branch 'tc-terse-dump' into next
Vlad Buslov  says:

====================

Implement support for terse dump mode which provides only essential
classifier/action info (handle, stats, cookie, etc.). Use new
TCA_DUMP_FLAGS_TERSE flag to prevent copying of unnecessary data from
kernel.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:18:43 -06:00
Vlad Buslov 477ca0dfb4 tc: implement support for terse dump
Implement support for classifier/action terse dump using new TCA_DUMP_FLAGS
tlv with only available flag value TCA_DUMP_FLAGS_TERSE. Set the flag when
user requested it with following example CLI (-br for 'brief'):

$ tc -s -br filter show dev ens1f0 ingress
filter protocol ip pref 49151 flower chain 0
filter protocol ip pref 49151 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

filter protocol ip pref 49152 flower chain 0
filter protocol ip pref 49152 flower chain 0 handle 0x1
  not_in_hw
        action order 1: gact    Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

In terse mode dump only outputs essential data needed to identify the
filter and action (handle, cookie, etc.) and stats, if requested by the
user. The intention is to significantly improve rule dump rate by omitting
all static data that do not change after rule is created.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:15:15 -06:00
Vlad Buslov a99ebeeef2 tc: skip actions that don't have options attribute when printing
Modify implementations that return error from action_until->print_aopt()
callback to silently skip actions that don't have their corresponding
TCA_ACT_OPTIONS attribute set (some actions already behave like this).
Print action kind before returning from action_until->print_aopt()
callbacks. This is necessary to support terse dump mode in following patch
in the series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-31 09:14:01 -06:00
Johannes Berg 9fc5bf734f libnetlink: define __aligned conditionally
On some systems (e.g. current Debian/stable) the inclusion
of utils.h pulled in some other things that may end up
defining __aligned, in a possibly different way than what
we had here.

Use our own definition only if there isn't one already.

Fixes: d5acae244f ("libnetlink: add nl_print_policy() helper")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-28 10:24:02 -07:00
David Ahern eb12cc9ae1 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:08:12 -06:00
Guillaume Nault f1298d7660 m_mpls: test the 'mac_push' action after 'modify'
Commit 02a261b5ba ("m_mpls: add mac_push action") added a matches()
test for the "mac_push" string before the test for "modify".
This changes the previous behaviour as 'action m' used to match
"modify" while it now matches "mac_push".

Revert to the original behaviour by moving the "mac_push" test after
"modify".

Fixes: 02a261b5ba ("m_mpls: add mac_push action")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-25 15:07:13 -06:00
David Ahern 2b7a768408 Merge branch 'tipc-encryption' into next
Tuong Lien  says:

====================

This series adds two new options in the 'iproute2/tipc' command, enabling users
to use the new TIPC encryption features, i.e. the master key and rekeying which
have been recently merged in kernel.

The help menu of the "tipc node set key" command is also updated accordingly:

 # tipc node set key --help
Usage: tipc node set key KEY [algname ALGNAME] [PROPERTIES]
       tipc node set key rekeying REKEYING

KEY
  Symmetric KEY & SALT as a composite ASCII or hex string (0x...) in form:
  [KEY: 16, 24 or 32 octets][SALT: 4 octets]

ALGNAME
  Cipher algorithm [default: "gcm(aes)"]

PROPERTIES
  master                - Set KEY as a cluster master key
  <empty>               - Set KEY as a cluster key
  nodeid NODEID         - Set KEY as a per-node key for own or peer

REKEYING
  INTERVAL              - Set rekeying interval (in minutes) [0: disable]
  now                   - Trigger one (first) rekeying immediately

EXAMPLES
  tipc node set key this_is_a_master_key master
  tipc node set key 0x746869735F69735F615F6B657931365F73616C74
  tipc node set key this_is_a_key16_salt algname "gcm(aes)" nodeid 1001002
  tipc node set key rekeying 600

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:05:40 -06:00
Tuong Lien 2bf1ba5a5c tipc: add option to set rekeying for encryption
As supported in kernel, the TIPC encryption rekeying can be tuned using
the netlink attribute - 'TIPC_NLA_NODE_REKEYING'. Now we add the
'rekeying' option correspondingly to the 'tipc node set key' command so
that user will be able to perform that tuning:

tipc node set key rekeying REKEYING

where the 'REKEYING' value can be:

INTERVAL              - Set rekeying interval (in minutes) [0: disable]
now                   - Trigger one (first) rekeying immediately

For example:
$ tipc node set key rekeying 60
$ tipc node set key rekeying now

The command's help menu is also updated with these descriptions for the
new command option.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:45 -06:00
Tuong Lien 5fb3681885 tipc: add option to set master key for encryption
In addition to the support of master key in kernel, we add the 'master'
option to the 'tipc node set key' command for user to be able to
specify a key as master key during the key setting. This is carried out
by turning on the new netlink flag - 'TIPC_NLA_NODE_KEY_MASTER'.
For example:

$ tipc node set key "this_is_a_master_key" master

The command's help menu is also updated to give a better description of
all the available options.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 09:04:37 -06:00
David Ahern b4edd6a8a6 Merge branch 'tc-mpls-l2-vpn' into next
Guillaume Nault  says:

====================

This patch series adds the possibility for TC to tunnel Ethernet frames
over MPLS.

Patch 1 allows adding or removing the Ethernet header.
Patch 2 allows pushing an MPLS LSE before the MAC header.

By combining these actions, it becomes possible to encapsulate an
entire Ethernet frame into MPLS, then add an outer Ethernet header
and send the resulting frame to the next hop.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:47 -06:00
Guillaume Nault 02a261b5ba m_mpls: add mac_push action
Add support for the new TCA_MPLS_ACT_MAC_PUSH action (kernel commit
a45294af9e96 ("net/sched: act_mpls: Add action to push MPLS LSE before
Ethernet header")). This action let TC push an MPLS header before the
MAC header of a frame.

Example (encapsulate all outgoing frames with label 20, then add an
outer Ethernet header):
 # tc filter add dev ethX matchall \
       action mpls mac_push label 20 ttl 64 \
       action vlan push_eth dst_mac 0a:00:00:00:00:02 \
                            src_mac 0a:00:00:00:00:01

This patch also adds an alias for ETH_P_TEB, since it is useful when
decapsulating MPLS packets that contain an Ethernet frame.

With MAC_PUSH, there's no previous Ethertype to modify. However, the
"protocol" option is still needed, because the kernel uses it to set
skb->protocol. So rename can_modify_ethtype() to can_set_ethtype().

Also add a test suite for m_mpls, which covers the new action and the
pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:57:08 -06:00
Guillaume Nault d61167dd88 m_vlan: add pop_eth and push_eth actions
Add support for the new TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH
actions (kernel commit 19fbcb36a39e ("net/sched: act_vlan:
Add {POP,PUSH}_ETH actions"). These action let TC remove or add the
Ethernet at the head of a frame.

Drop an Ethernet header:
 # tc filter add dev ethX matchall action vlan pop_eth

Push an Ethernet header (the original frame must have no MAC header):
 # tc filter add dev ethX matchall action vlan \
       push_eth dst_mac 0a:00:00:00:00:02 src_mac 0a:00:00:00:00:01

Also add a test suite for m_vlan, which covers these new actions and
the pre-existing ones.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-20 08:36:38 -06:00
Jacob Keller 3342688a66 devlink: display elapsed time during flash update
For some devices, updating the flash can take significant time during
operations where no status can meaningfully be reported. This can be
somewhat confusing to a user who sees devlink appear to hang on the
terminal waiting for the device to update.

Recent changes to the kernel interface allow such long running commands
to provide a timeout value indicating some upper bound on how long the
relevant action could take.

Provide a ticking counter of the time elapsed since the previous status
message in order to make it clear that the program is not simply stuck.

Display this message whenever the status message from the kernel
indicates a timeout value. Additionally also display the message if
we've received no status for more than couple of seconds. If we elapse
more than the timeout provided by the status message, replace the
timeout display with "timeout reached".

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-17 09:30:06 -06:00
Stephen Hemminger cb7ce51cc1 v5.9.0 2020-10-15 15:18:35 -07:00
zhangkaiheb@126.com 78ace1c211 tc: fq: clarify the length of orphan_mask.
Signed-off-by: kai zhang <zhangkaiheb@126.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-15 15:16:52 -07:00
Jan Engelhardt 0ca1312c20 ip: add error reporting when RTM_GETNSID failed
`ip addr` when run under qemu-user-riscv64, fails. This likely is due
to qemu-5.1 not doing translation of RTM_GETNSID calls. Aborting ip
completely is not helpful for the user however. This patch reworks
the error handling.

Before:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
request send failed: Operation not supported
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ffrtest:/ #

Afterwards:

rtest:/ # ip a
2: host0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
rtnl_send(RTM_GETNSID): Operation not supported. Continuing anyway.
    link/ether 46:3f:2d:88:3d:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.72.147/28 brd 192.168.72.159 scope global host0
       valid_lft forever preferred_lft forever
    inet6 fe80::443f:2dff:fe88:3ddb/64 scope link
       valid_lft forever preferred_lft forever

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-12 08:10:25 -07:00
Dmitry Yakunin 58c3c55f38 lib: ignore invalid mounts in cg_init_map
In case of bad entries in /proc/mounts just skip cgroup cache initialization.
Cgroups in output will be shown as "unreachable:cgroup_id".

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reported-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 23:02:35 -07:00
Stephen Hemminger 003b9af516 uapi: add new SNMP entry
Update to snmp.h from 5.9

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-11 22:50:22 -07:00
David Ahern b5a583fb32 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:11:09 -06:00
Johannes Berg 7812012849 genl: ctrl: print op -> policy idx mapping
Newer kernels can dump per-op policies, so print out the new
mapping attribute to indicate which op has which policy.

v2:
 * print out both do/dump policy idx
v3:
 * fix userspace API which renumbered after patch rebasing

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:10:09 -06:00
David Ahern 91c54917cd Merge branch 'bridge-igmpv3-mldv2' into next
Nikolay Aleksandrov  says:

====================
This set adds support for IGMPv3/MLDv2 attributes, they're mostly
read-only at the moment. The only new "set" option is the source address
for S,G entries. It is added in patch 01 (see the patch commit message for
an example). Patch 02 shows a missing flag (fast_leave) for
completeness, then patch 03 shows the new IGMPv3/MLDv2 flags:
added_by_star_ex and blocked. Patches 04-06 show the new extra
information about the entry's state when IGMPv3/MLDv2 are enabled. That
includes its filter mode (include/exclude), source list with timers and
origin protocol (currently only static/kernel), in order to show the new
information the user must use "-d"/show_details.
Here's the output of a few IGMPv3 entries:
 dev bridge port ens12 grp 239.0.0.1 src 20.21.22.23 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 8.9.10.11 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 src 1.2.3.1 temp filter_mode include proto kernel  blocked    0.00
 dev bridge port ens12 grp 239.0.0.1 temp filter_mode exclude source_list 20.21.22.23/0.00,8.9.10.11/0.00,1.2.3.1/0.00 proto kernel    26.65

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:09:14 -06:00
Nikolay Aleksandrov 86588450c5 bridge: mdb: print protocol when available
Print the mdb entry's protocol (i.e. who added it)  when it's available if
the user requested to show details (-d). Currently the only possible
values are RTPROT_STATIC (user-space added) or RTPROT_KERNEL
(automatically added by kernel). The value is kernel controlled.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:50 -06:00
Nikolay Aleksandrov 2de81d1eff bridge: mdb: print source list when available
Print the mdb entry's source list when it's available if the user
requested to show details (-d). Each source has an associated timer
which controls if traffic should be forwarded to that S,G entry (if the
timer is non-zero traffic is forwarded, otherwise it's not).
Currently the source list is kernel controlled and can't be changed by
user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:45 -06:00
Nikolay Aleksandrov 1d28c48046 bridge: mdb: print filter mode when available
Print the mdb entry's filter mode when it's available if the user
requested to show details (-d). It can be either include or exclude.
Currently it's kernel controlled and can't be changed by user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:39 -06:00
Nikolay Aleksandrov e331677ea2 bridge: mdb: show igmpv3/mldv2 flags
With IGMPv3/MLDv2 support we have 2 new flags:
 - added_by_star_ex: set when the S,G entry was automatically created
                     because of a *,G entry in EXCLUDE mode
 - blocked: set when traffic for the S,G entry for that port has to be
            blocked
Both flags are used only on the new S,G entries and are currently kernel
managed, i.e. similar to other flags which can't be set from user-space.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:34 -06:00
Nikolay Aleksandrov f94e8b0749 bridge: mdb: print fast_leave flag
We're not showing the fast_leave flag when it's set. Currently that can
be only when an mdb entry is being deleted due to fast leave, so it will
only affect mdb monitor.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:30 -06:00
Nikolay Aleksandrov 547b319762 bridge: mdb: add support for source address
This patch adds the user-space control and dump of mdb entry source
address. When setting the new MDBA_SET_ENTRY_ATTRS nested attribute is
used and inside is added MDBE_ATTR_SOURCE based on the address family.
When dumping we look for MDBA_MDB_EATTR_SOURCE and if present we add the
"src x.x.x.x" output. The source address will be always shown as it's
needed to match the entry to modify it from user-space.

Example:
 $ bridge mdb add dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100
 $ bridge mdb show
 dev bridge port ens13 grp 239.0.0.1 src 1.2.3.4 permanent vid 100

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:07:25 -06:00
David Ahern f905191a48 Update kernel headers
Update kernel headers to commit:
    bc081a693a56 ("Merge branch 'Offload-tc-vlan-mangle-to-mscc_ocelot-switch'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-11 20:04:57 -06:00
Antony Antony 4322b13c8d ip xfrm: support setting XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute can be set in states (4.19+)
It is optional and the kernel default is 0xffffffff
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

e.g.
./ip/ip xfrm state add output-mark 0x6 mask 0xab proto esp \
 auth digest_null 0 enc cipher_null ''
ip xfrm state
src 0.0.0.0 dst 0.0.0.0
	proto esp spi 0x00000000 reqid 0 mode transport
	replay-window 0
	output-mark 0x6/0xab
	auth-trunc digest_null 0x30 0
	enc ecb(cipher_null)
	anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
	sel src 0.0.0.0/0 dst 0.0.0.0/0

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:10:47 -06:00
Jiri Pirko 8dc1db80e4 devlink: Add health reporter test command support
Add health reporter test command and allow user to trigger a test event.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:08:53 -06:00
Jacob Keller 012164718b devlink: support setting the overwrite mask attribute
The recently added DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK allows
userspace to indicate how a device should handle subsections of a flash
component when updating. For example, a flash component might contain
vital data such as PCIe serial number or configuration fields such as
settings that control device bootup.

The overwrite mask allows specifying whether the device should overwrite
these subsections when updating from the provided image. If nothing is
specified, then the update is expected to preserve all vital fields and
configuration.

Add support for specifying the overwrite mask using the new "overwrite"
option to the flash command line.

By specifying "overwrite identifiers", the user request that the flash
update should overwrite any settings in the updated flash component with
settings from the provided flash image

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite identifiers

By specifying "overwrite settings" the user requests that the flash update
should overwrite any settings in the updated flash component with setting
values from the provided flash image.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings

These options may be combined, in which case both subsections will be sent
in the overwrite mask, resulting in a request to overwrite all settings and
identifiers stored in the updated flash components.

  $devlink dev flash pci/0000:af:00.0 file flash_image.bin overwrite settings overwrite identifiers

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:02:16 -06:00
David Ahern 34be2d2619 Update kernel headers
Update kernel headers to commit:
    9faebeb2d800 ("Merge branch 'ethtool-allow-dumping-policies-to-user-space'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-10-07 00:01:26 -06:00
Stephen Hemminger be1bea8432 addr: Fix noprefixroute and autojoin for IPv4
These were reported as IPv6-only and ignored:

     # ip address add 192.0.2.2/24 dev dummy5 noprefixroute
     Warning: noprefixroute option can be set only for IPv6 addresses
     # ip address add 224.1.1.10/24 dev dummy5 autojoin
     Warning: autojoin option can be set only for IPv6 addresses

This enables them back for IPv4.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Adel Belhouane <bugs.a.b@free.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:15:56 -07:00
Eyal Birger e410c963e3 ipntable: add missing ndts_table_fulls ntable stat
Used for tracking neighbour table overflows.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-06 15:07:10 -07:00
Kamal Heib 10414de9e6 ip: iplink_ipoib.c: Remove extra spaces
Remove the extra space between the reported ipoib attrs - use only one
space instead of two.

Fixes: de0389935f ("iplink: Added support for the kernel IPoIB RTNL ops")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 22:29:05 -07:00
Ciara Loftus d2be31d9b6 ss: add support for xdp statistics
The patch exposes statistics for XDP sockets which can be useful for
debugging purposes.

The stats exposed are:
    rx dropped
    rx invalid
    rx queue full
    rx fill ring empty
    tx invalid
    tx ring empty

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:21:24 -06:00
David Ahern f481515c89 Update kernel headers
Update kernel headers to commit:
    280095713ce2 ("Merge branch 'ibmvnic-refactor-some-send-handle-functions'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-29 09:13:21 -06:00
Stephen Hemminger 03fb6fa1d8 uapi: update headers from 5.9-rc7
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:50:36 -07:00
Jan Engelhardt fece144abc build: avoid make jobserver warnings
I observe:

	» make -j8 CCOPTS=-ggdb3
	lib
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	make[1]: Nothing to be done for 'all'.
	ip
	make[1]: warning: -j8 forced in submake: resetting jobserver mode.
	    CC       ipntable.o

MFLAGS is a historic variable of some kind; removing it fixes the
jobserver issue.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-28 13:49:24 -07:00
Jakub Kicinski b8663da049 ip: promote missed packets to the -s row
missed_packet_errors are much more commonly reported:

linux$ git grep -c '[.>]rx_missed_errors ' -- drivers/ | wc -l
64
linux$ git grep -c '[.>]rx_over_errors ' -- drivers/ | wc -l
37

Plus those drivers are generally more modern than those
using rx_over_errors.

Since recently merged kernel documentation makes this
preference official, let's make ip -s output more informative
and let rx_missed_errors take the place of rx_over_errors.

Before:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    6.04T      4.67G    0       0       0       67.7M
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       7
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

After:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:c1:4d:38 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped missed  mcast
    6.04T      4.67G    0       0       7       67.7M
    RX errors: length   crc     frame   fifo    overrun
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3.13T      2.76G    0       0       0       0
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       6

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:23:29 -06:00
David Ahern cec67df974 Merge branch 'devlink-controller-external-info' into next
Parav Pandit  says:

====================

For certain devlink port flavours controller number and optionally external=
 attributes are reported by the kernel.

(a) controller number indicates that a given port belong to which local or =
external controller.
(b) external port attribute indicates that if a given port is for external =
or local controller.

This short series shows this attributes to user.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:17:48 -06:00
Parav Pandit 748cbad33b devlink: Show controller number of a devlink port
Show the controller number of the devlink port whenever kernel reports
it.

Example of a PCI VF port for an external controller number 1:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:09 -06:00
Parav Pandit 8fadd01101 devlink: Show external port attribute
If a port is for an external controller, port's external attribute is
set. Show such external attribute.

An example of an external controller port for PCI VF:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:13:04 -06:00
David Ahern 454429e8b4 Update kernel headers
Update kernel headers to commit:
    748d1c8a425e ("Merge branch 'devlink-Use-nla_policy-to-validate-range'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-22 20:10:43 -06:00
Roman Mashak aba44dc2ea ip: updated ip-link man page
Added description of link flags allmulticast, promisc and trailers.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-14 20:42:04 -07:00
Wei Wang ad34d5fadb iproute2: ss: add support to expose various inet sockopts
This commit adds support to expose the following inet socket options:
-- recverr
-- is_icsk
-- freebind
-- hdrincl
-- mc_loop
-- transparent
-- mc_all
-- nodefrag
-- bind_address_no_port
-- recverr_rfc4884
-- defer_connect
with the option --inet-sockopt. The individual option is only shown
when set.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:36:06 -06:00
David Ahern c8eb4b52c1 Update kernel headers
Update kernel headers to commit:
4349abdb409b ("net: dsa: don't print non-fatal MTU error if not supported")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-08 20:35:28 -06:00
Hoang Le abee772ff1 tipc: support 128bit node identity for peer removing
Problem:
In kernel upstream, we add the support to set node identity with
128bit. However, we are still using legacy format in command tipc
peer removing. Then, we got a problem when trying to remove
offline node i.e:

$ tipc node list
Node Identity                    Hash     State
d6babc1c1c6d                     1cbcd7ca down

$ tipc peer remove address d6babc1c1c6d
invalid network address, syntax: Z.C.N
error: No such device or address

Solution:
We add the support to remove a specific node down with 128bit
node identifier, as an alternative to legacy 32-bit node address.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 20:01:39 -06:00
Roopa Prabhu 6fd53b2a1c iplink: add support for protodown reason
This patch adds support for recently
added link IFLA_PROTO_DOWN_REASON attribute.
IFLA_PROTO_DOWN_REASON enumerates reasons
for the already existing IFLA_PROTO_DOWN link
attribute.

$ cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ ip link set dev vx10 protodown on protodown_reason vrrp on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "proto_down_reason": [ "vrrp" ]
} ]
$ip link set dev vx10 protodown_reason mlag on
$ip link show dev vx10
14: vx10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f2:32:28:b8:35:ff brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <mlag,vrrp>
$ip -p -j link show dev vx10
[ {
	<snip>
        "proto_down": true,
        "protodown_reason": [ "mlag","vrrp" ]
} ]

$ip -p -j link show dev vx10
$ip link set dev vx10 protodown off protodown_reason vrrp off
Error: Cannot clear protodown, active reasons.
$ip link set dev vx10 protodown off protodown_reason mlag off
$

Note: for somereason the json and non-json key for protodown
are different (protodown and proto_down). I have kept the
same for protodown reason for consistency (protodown_reason and
proto_down_reason).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:52:13 -06:00
Antony Antony af27494d2e ip xfrm: support printing XFRMA_SET_MARK_MASK attribute in states
The XFRMA_SET_MARK_MASK attribute is set in states (4.19+).
It is the mask of XFRMA_SET_MARK(a.k.a. XFRMA_OUTPUT_MARK in 4.18)

sample output: note the output-mark mask
ip xfrm state
	src 192.1.2.23 dst 192.1.3.33
	proto esp spi 0xSPISPI reqid REQID mode tunnel
	replay-window 32 flag af-unspec
	output-mark 0x3/0xffffff
	aead rfc4106(gcm(aes)) 0xENCAUTHKEY 128
	if_id 0x1

Signed-off-by: Antony Antony <antony@phenome.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:49:29 -06:00
David Ahern 275eed9be5 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-09-01 19:46:20 -06:00
Phil Sutter 23203b750e ip link: Fix indenting in help text
Indenting of 'ip link set' options below 'link-netns' was wrong, they
should be on the same level as the above.

While being at it, fix closing brackets in vf-specific options. Also
write node/port_guid parameters in upper-case without curly braces: They
are supposed to be replaced by values, not put literally.

Fixes: 8589eb4efd ("treewide: refactor help messages")
Fixes: 5a3ec4ba64 ("iplink: Update usage in help message")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-31 12:32:26 -07:00
Johannes Berg cc889b8241 genl: ctrl: support dumping netlink policy
Support dumping the netlink policy of a given generic netlink
family, the policy (with any sub-policies if appropriate) is
exported by the kernel in a general fashion.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:14 -06:00
Johannes Berg d5acae244f libnetlink: add nl_print_policy() helper
This prints out the data from the given nested attribute
to the given FILE pointer, interpreting the firmware that
the kernel has for showing netlink policies.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:35:07 -06:00
Johannes Berg 784fa9f62f libnetlink: add rtattr_for_each_nested() iteration macro
This is useful for iterating elements in a nested attribute,
if they're not parsed with a strict length limit or such.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-08-24 21:34:29 -06:00
Murali Karicheri ea6aeeb90c ip: iplink: prp: update man page for new parameter
PRP support requires a proto parameter which is 0 for hsr and 1 for
prp. Default is hsr and is backward compatible.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Murali Karicheri 68f027724b iplink: hsr: add support for creating PRP device similar to HSR
This patch enhances the iplink command to add a proto parameters to
create PRP device/interface similar to HSR. Both protocols are
quite similar and requires a pair of Ethernet interfaces. So re-use
the existing HSR iplink command to create PRP device/interface as
well. Use proto parameter to differentiate the two protocols.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:14:12 -07:00
Amit Cohen 8e6bce735a devlink: Add fflush() in cmd_mon_show_cb()
Similar to other print functions we need to flush buffered data
in order to work with pipes and output redirects.

Without it, stdout output is buffered and not written to the disk.

This is useful when writing scripts that rely on devlink-monitor output.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:13:11 -07:00
Sascha Hauer 7e7a1d107b iproute2: ip maddress: Check multiaddr length
ip maddress add|del takes a MAC address as argument, so insist on
getting a length of ETH_ALEN bytes. This makes sure the passed argument
is actually a MAC address and especially not an IPv4 address which
was previously accepted and silently taken as a MAC address.

While at it, do not print *argv in the error path as this has been
modified by ll_addr_a2n() and doesn't contain the full string anymore,
which can lead to misleading error messages.

Also while at it, replace the hardcoded buffer size with the actual
buffer size using sizeof().

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-22 21:12:30 -07:00
Stephen Hemminger bf538de59d uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 16:09:52 -07:00
Leon Romanovsky db6d6becb0 rdma: Properly print device and link names in CLI output
The citied commit broke the CLI output and printed ifindex/ifname
instead of dev/link.

Before:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/lqpn 1 type GSI state RTS sq-psn 0 comm ib_core
[leonro@vm ~]$ rdma res show cq
ifindex 0 ifname rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm ib_core

After:
[leonro@vm ~]$ rdma res show qp
link mlx5_0/- lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
[leonro@vm ~]$ rdma res show cq
dev rocep0s9 cqn 0 cqe 1023 users 2 poll-ctx WORKQUEUE adaptive-moderation on comm [ib_core]

It was missed because rdmatool mostly used in JSON mode.

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Leon Romanovsky 7ded3c97b9 rdma: Fix owner name for the kernel resources
Owner of kernel resources is printed in different format than user
resources to easy with the reader by simply looking on the name.
The kernel owner will have "[ ]" around the name.

Before this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm ib_core

After this change:
[leonro@vm ~]$ rdma res show qp
link rocep0s9/1 lqpn 1 type GSI state RTS sq-psn 58 comm [ib_core]

Fixes: b0a688a542 ("rdma: Rewrite custom JSON and prints logic to use common API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-16 15:50:02 -07:00
Stephen Hemminger 52d767aff8 uapi: update kernel headers
pre-rc1 version of Linux kernel headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-11 13:18:41 -07:00
Mark Zhang e8e8f16ed1 rdma: Document the new "pid" criteria for auto mode
Document the new supported criteria of auto mode. Examples:
$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto pid,type on

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:12 +00:00
Mark Zhang e28133316d rdma: Add "PID" criteria support for statistic counter auto mode
With this new criteria, QPs have different PIDs will be bound to
different counters in auto mode. This can be used in combination with
other criteria like "type". Examples:

$ rdma statistic qp set link mlx5_2/1 auto pid on
$ rdma statistic qp set link mlx5_2/1 auto type,pid on
$ rdma statistic qp set link mlx5_2/1 auto off
$ rdma statistic qp show link mlx5_0 qp-type UD

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:26:04 +00:00
Mark Zhang cb69794736 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 76251e15ea73
("RDMA/counter: Add PID category support in auto mode")

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:25:04 +00:00
David Ahern e572e3af0d Merge branch 'main' into next
Conflicts:
	bridge/fdb.c
	man/man8/bridge.8

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-06 16:21:35 +00:00
Stephen Hemminger 53159d8115 v5.8.0 2020-08-03 10:03:42 -07:00
Stephen Hemminger d530608d33 lnstat: use same version as iproute2
Lnstat was trying to be different and have its own version.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Stephen Hemminger fbef655568 replace SNAPSHOT with auto-generated version string
Replace the iproute2 snapshot with a version string which is
autogenerated as part of the build process using git describe.

This will also allow seeing if the version of the command
is built from the same sources is as upstream.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-08-03 10:02:47 -07:00
Vasundhara Volam 7332b188a6 devlink: Add board.serial_number to info subcommand.
Add support for reading board serial_number to devlink info
subcommand. Example:

$ devlink dev info pci/0000:af:00.0 -jp
{
    "info": {
        "pci/0000:af:00.0": {
            "driver": "bnxt_en",
            "serial_number": "00-10-18-FF-FE-AD-1A-00",
            "board.serial_number": "433551F+172300000",
            "versions": {
                "fixed": {
                    "board.id": "7339763 Rev 0.",
                    "asic.id": "16D7",
                    "asic.rev": "1"
                },
                "running": {
                    "fw": "216.1.216.0",
                    "fw.psid": "0.0.0",
                    "fw.mgmt": "216.1.192.0",
                    "fw.mgmt.api": "1.10.1",
                    "fw.ncsi": "0.0.0.0",
                    "fw.roce": "216.1.16.0"
                }
            }
        }
    }
}

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 15:35:56 +00:00
Petr Vaněk a7f1974f6e ip-xfrm: add support for oseq-may-wrap extra flag
This flag allows to create SA where sequence number can cycle in
outbound packets if set.

Signed-off-by: Petr Vaněk <pv@excello.cz>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:57:25 +00:00
David Ahern 91922a4121 Update kernel headers
Update kernel headers to commit:
    bd0b33b24897 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:56:28 +00:00
Danielle Ratson e5f4165a9e devlink: Expose port split ability
Add a new attribute that indicates the port split ability to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1
splittable false lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:37:14 +00:00
Danielle Ratson fcbc6c1c71 devlink: Expose number of port lanes
Add a new attribute that indicates the port's number of lanes to devlink port.

Expose the attribute to user space as RO value, for example:

$devlink port show swp1
pci/0000:03:00.0/61: type eth netdev swp1 flavour physical port 1 lanes 1

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-08-03 14:36:52 +00:00
Julien Fortin cb17e0cc57 bridge: fdb show: fix fdb entry state output for json context
bridge json fdb show is printing an incorrect / non-machine readable
value, when using -j (json output) we are expecting machine readable
data that shouldn't require special handling/parsing.

$ bridge -j fdb show | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
	"master": "br0",
	"mac": "56:23:28:4f:4f:e5",
	"flags": [],
	"ifname": "vx0",
	"state": "state=0x80"  <<<<<<<<< with the patch: "state": "0x80"
    }
]

Fixes: c7c1a1ef51 ("bridge: colorize output and use JSON print library")

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-29 18:08:46 -07:00
Briana Oursler 41a9e38469 tc: Add space after format specifier
Add space after format specifier in print_string call. Fixes broken
qdisc tests within tdc testing suite. Per suggestion from Petr Machata,
remove a space and change spacing in tc/q_event.c to complete the fix.

Tested fix in tdc using:
./tdc.py -c qdisc

All qdisc RED tests return ok.

Fixes: d0e450438571("tc: q_red: Add support for qevents "mark" and "early_drop")
Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Tested-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-29 17:03:46 +00:00
Anton Danilov 65c0c4d21b bridge: fdb: the 'dynamic' option in the show/get commands
In most of cases a user wants to see only the dynamic mac addresses
in the fdb output. But currently the 'fdb show' displays tons of
various self entries, those only waste the output without any useful
goal.

New option 'dynamic' for 'show' and 'get' commands forces display
only relevant records.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:41:39 -07:00
Matthieu Baerts 3a53ff7e58 mptcp: show all endpoints when no ID is specified
According to 'ip mptcp help', 'endpoint show' can accept no argument:

  ip mptcp endpoint show [ id ID ]

It makes sense to print all endpoints when no filter is used.

So here if the following command is used, all endpoints are printed:

  ip mptcp endpoint show

Same as:

  ip mptcp endpoint

Fixes: 7e0767cd ("add support for mptcp netlink interface")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-27 16:39:58 -07:00
David Ahern 1ca65af1c5 Merge branch 'devlink-port-health' into next
Moshe Shemesh  says:

====================

Implement commands for interaction with per-port devlink health
reporters. To do this, adapt devlink-health for usage of port handles
with any existing devlink-health subcommands. Add devlink-port health
subcommand as an alias for devlink-health.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:34:07 +00:00
Vladyslav Tarasiuk 1fe8c44bd9 devlink: Update devlink-health and devlink-port manpages
Describe support for per-port reporters in devlink-health and
devlink-port commands.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:37 +00:00
Vladyslav Tarasiuk 211c8d6ca9 devlink: Add devlink port health command
Add devlink port health show subcommand which displays information about
specified port reporter or all present port reporters as in the example.
Device and port reporters can be distinguished by a handle being used.

Make other devlink-health subcommands be aliased by devlink port health.
Refactor devlink-health commands for usage of port handles in order to
interact with port reporters.

Change devlink health show output to dump information about both device
and port reporters with correct handles.

Example:
$ devlink health show
pci/0000:00:0b.0:
  reporter fw
    state healthy error 0 recover 0 auto_dump true
  reporter fw_fatal
    state healthy error 0 recover 0 grace_period 1200000 auto_recover true auto_dump true
pci/0000:00:0b.0/1:
  reporter tx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink health show pci/0000:00:0b.0/1 reporter rx
Which is equivalent to:
$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 10000 auto_recover true auto_dump true

$ devlink port health show pci/0000:00:0b.0/1 reporter rx -j --pretty
{
    "health": {
         "pci/0000:00:0b.0/1": [ {
                 "reporter": "rx",
                 "state": "healthy",
                 "error": 0,
                 "recover": 0,
                 "grace_period": 500,
                 "auto_recover": true,
                 "auto_dump": true
              } ]
    }
}

$ devlink health set pci/0000:00:0b.0/1 reporter rx grace_period 5000
Which is equivalent to:
$ devlink port health set pci/0000:00:0b.0/1 reporter rx grace_period 5000

$ devlink port health show pci/0000:00:0b.0/1 reporter rx
pci/0000:00:0b.0/1:
  reporter rx
    state healthy error 0 recover 0 grace_period 5000 auto_recover true auto_dump true

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:32 +00:00
Vladyslav Tarasiuk e533faa72e devlink: Add a possibility to print arrays of devlink port handles
Add a capability of printing port handles for arrays in non-JSON format
in devlink-health manner.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-23 00:32:26 +00:00
Stephen Hemminger 848b1b8e04 uapi: update bpf.h
Upstrean 5.8-rc6 changes.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-21 09:18:15 -07:00
Guillaume Nault 4735df15a2 testsuite: Add tests for bareudp tunnels
Test the plain MPLS (unicast and multicast) and IP (v4 and v6) modes.
Also test the multiproto option for MPLS and for IP.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:30:55 -07:00
Anton Danilov 8f5a602f7a misc: make the pattern matching case-insensitive
To improve the usability better use case-insensitive pattern-matching
in ifstat, nstat and ss tools.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:29:55 -07:00
Jamie Gloudon 66702fb9ba tc/m_estimator: Print proper value for estimator interval in raw.
While looking at the estimator code, I noticed an incorrect interval
number printed in raw for the handles. This patch fixes the formatting.

Before patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=4294967294 e=2]

After patch:

root@bytecenter.fr:~# tc -r filter add dev eth0 ingress estimator
250ms 999ms matchall action police avrate 12mbit conform-exceed drop
[estimator i=-2 e=2]

Signed-off-by: Jamie Gloudon <jamie.gloudon@gmx.fr>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-20 13:25:56 -07:00
David Ahern 7b6361bf61 Merge branch 'tc-qevent-block' into next
Petr Machata  says:

====================

When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well:

    # ip link add up type dummy
    # tc qdisc add dev dummy1 root handle 1: \
         red min 30000 max 60000 avpkt 1000 qevent early_drop block 100
    # tc filter add block 100 pref 1234 handle 102 matchall action drop
    # tc filter show block 100
    Cannot find block "100"

This patchset fixes this issue:

    # tc filter show block 100
    filter protocol all pref 1234 matchall chain 0
    filter protocol all pref 1234 matchall chain 0 handle 0x66
      not_in_hw
            action order 1: gact action drop
             random type none pass val 0
             index 2 ref 1 bind 1

In patch #1, the helpers and necessary infrastructure is introduced,
including a new qdisc_util callback that implements sniffing out bound
blocks in a given qdisc.

In patch #2, RED implements the new callback.

v3:
- Patch #1:
    - Do not pass &ctx->found directly to has_block. Do it through a
      helper variable, so that the callee does not overwrite the result
      already stored in ctx->found.

v2:
- Patch #1:
    - In tc_qdisc_block_exists_cb(), do not initialize 'q'.
    - Propagate upwards errors from q->has_block.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:36:41 +00:00
Petr Machata 02dce2fdce tc: q_red: Implement has_block for RED
In order for "tc filter show block X" to find a given block, implement the
has_block callback.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:49 +00:00
Petr Machata af0e036c09 tc: Look for blocks in qevents
When a list of filters at a given block is requested, tc first validates
that the block exists before doing the filter query. Currently the
validation routine checks ingress and egress blocks. But now that blocks
can be bound to qevents as well, qevent blocks should be looked for as
well.

In order to support that, extend struct qdisc_util with a new callback,
has_block. That should report whether, give the attributes in TCA_OPTIONS,
a blocks with a given number is bound to a qevent. In
tc_qdisc_block_exists_cb(), invoke that callback when set.

Add a helper to the tc_qevent module that walks the list of qevents and
looks for a given block. This is meant to be used by the individual qdiscs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-20 16:34:02 +00:00
Paolo Abeni 9c3be2c0ee ss: mptcp: add msk diag interface support
This implement support for MPTCP sockets type, comprising
extended socket info. Note that we need to add an extended
attribute carrying the actual protocol number to the diag
request.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:57:36 +00:00
David Ahern beaf281cff Update kernel headers
Update kernel headers to commit:
    81adcd65b685 ("ksz884x: switch from 'pci_' to 'dma_' API")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:56:53 +00:00
David Ahern b78c480532 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-14 23:52:43 +00:00
Eyal Birger f33a871b80 ip xfrm: policy: support policies with IF_ID in get/delete/deleteall
The XFRMA_IF_ID attribute is set in policies for them to be
associated with an XFRM interface (4.19+).

Add support for getting/deleting policies with this attribute.

For supporting 'deleteall' the XFRMA_IF_ID attribute needs to be
explicitly copied.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Eyal Birger ee93c1107f ip xfrm: update man page on setting/printing XFRMA_IF_ID in states/policies
In commit aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
I added the ability to set/print the xfrm interface ID without updating
the man page.

Fixes: aed63ae1ac ("ip xfrm: support setting/printing XFRMA_IF_ID attribute in states/policies")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:51:37 -07:00
Hoang Huu Le ca75a86337 tipc: fixed a compile warning in tipc/link.c
Fixes: 5027f233e3 ("tipc: add link broadcast get")
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:43:32 -07:00
Julien Fortin 8fc09aff8d bridge: fdb get: add missing json init (new_json_obj)
'bridge fdb get' has json support but the json object is never initialized

before patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0
56:23:28:4f:4f:e5 dev vx0 master br0 permanent
$

after patch:

$ bridge -j fdb get 56:23:28:4f:4f:e5 dev vx0 | \
python -c \
'import sys,json;print(json.dumps(json.loads(sys.stdin.read()),indent=4))'
[
    {
        "master": "br0",
        "mac": "56:23:28:4f:4f:e5",
        "flags": [],
        "ifname": "vx0",
        "state": "permanent"
    }
]
$

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-13 08:41:42 -07:00
Tony Ambardar 650591a7a7 configure: support ipset version 7 with kernel version 5
The configure script checks for ipset v6 availability but doesn't test
for v7, which is backward compatible and used on kernel v5.x systems.
Update the script to test for both ipset versions. Without this change,
the tc ematch function em_ipset will be disabled.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:48:02 -07:00
Andrea Claudi a8d6f51c84 ip address: remove useless include
utils.h is included two times in ipaddress.c, there is no need for that.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:47:28 -07:00
Stephen Hemminger 0689785782 genl: use <> for system includes
Be consistent about local versus system headers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:41:24 -07:00
Stephen Hemminger a12b203c78 rtacct: drop unused header 2020-07-08 08:40:20 -07:00
Stephen Hemminger d44bcd2fbf iplink_bareudp: use common include syntax
Follow the precedent of other parts of iproute2 follow the example of:
  Standard libc headers
  Linux headers

  Iproute2 support headers

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-08 08:38:58 -07:00
Louis Peens 7c8d7848c7 devlink: add 'disk' to 'fw_load_policy' string validation
The 'fw_load_policy' devlink parameter supports the 'disk' value
since kernel v5.4, seems like there was some oversight in adding
this to iproute, fixed by this patch.

Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:57 -07:00
Ido Schimmel 2d4c3f65e2 devlink: Document zero policer identifier
When setting a policer to a trap group, a value of "0" will unbind the
currently bound policer from the group.

The behavior is intentional and tested in kernel selftests, so document
it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:14:24 -07:00
Guillaume Nault eb09a15c12 tc: flower: support multiple MPLS LSE match
Add the new "mpls" keyword that can be used to match MPLS fields in
arbitrary Label Stack Entries.
LSEs are introduced by the "lse" keyword and followed by LSE options:
"depth", "label", "tc", "bos" and "ttl". The depth is manadtory, the
other options are optionals.

For example, the following filter drops MPLS packets having two labels,
where the first label is 21 and has TTL 64 and the second label is 22:

$ tc filter add dev ethX ingress proto mpls_uc flower mpls \
    lse depth 1 label 21 ttl 64 \
    lse depth 2 label 22 bos 1 \
    action drop

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:12:43 -07:00
Guillaume Nault a6c5c952ab ip link: initial support for bareudp devices
Bareudp devices provide a generic L3 encapsulation for tunnelling
different protocols like MPLS, IP, NSH, etc. inside a UDP tunnel.

This patch is based on original work from Martin Varghese:
https://lore.kernel.org/netdev/1570532361-15163-1-git-send-email-martinvarghesenokia@gmail.com/

Examples:

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls_uc

This creates a bareudp tunnel device which tunnels L3 traffic with
ethertype 0x8847 (unicast MPLS traffic). The destination port of the
UDP header will be set to 6635. The device will listen on UDP port 6635
to receive traffic.

  - ip link add dev bareudp0 type bareudp dstport 6635 ethertype ipv4 multiproto

Same as the MPLS example, but for IPv4. The "multiproto" keyword allows
the device to also tunnel IPv6 traffic.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:11:05 -07:00
Dmitry Yakunin 8f1cd119b3 lib: fix checking of returned file handle size for cgroup
Before this patch check is happened only in case when we try to find
cgroup at cgroup2 mount point.

v2:
  - add Fixes line before Signed-off-by (David Ahern)

Fixes: d5e6ee0dac ("ss: introduce cgroup2 cache and helper functions")
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:05:54 -07:00
Sorah Fukumori 9e5d246877 ip fou: respect preferred_family for IPv6
ip(8) accepts -family ipv6 (-6) option at the toplevel. It is
straightforward to support the existing option for modifying listener
on IPv6 addresses.

Maintain the backward compatibility by leaving ip fou -6 flag
implemented, while it's removed from the usage message.

Signed-off-by: Sorah Fukumori <her@sorah.jp>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:03:09 -07:00
Anton Danilov d80a05b795 tc: improve the qdisc show command
Before can be possible show only all qeueue disciplines on an interface.
There wasn't a way to get the qdisc info by handle or parent, only full
dump of the disciplines with a following grep/sed usage.

Now new and old options work as expected to filter a qdisc by handle or
parent.

Full syntax of the qdisc show command:

tc qdisc { show | list } [ dev STRING ] [ QDISC_ID ] [ invisible ]
  QDISC_ID := { root | ingress | handle QHANDLE | parent CLASSID }

This change doesn't require any changes in the kernel.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Stephen Hemminger 085622b1f5 uapi: update bpf.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:51 -07:00
Bjarni Ingi Gislason 860a5d12d5 devlint-health.8: use a single-font macro for a single argument
Use a single font macro for a single argument.

  Remove unnecessary quotes for a single-font macro.

  Join two lines into one.

  The output of "nroff" and "groff" is unchanged.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:47 -07:00
Bjarni Ingi Gislason f9bc806c9d devlink-dev.8: use a single-font macro for one argument
Use a single-font macro for one argument.

  Remove unnecessary quotes for a single font macro.

  Join some lines into one.

  The output of "nroff" and "groff" is unchanged, except for a font
change in two lines.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:38 -07:00
Bjarni Ingi Gislason 472fb39d55 devlink.8: Use a single-font macro for a single argument
Use a single-font macro for a single argument

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:34 -07:00
Bjarni Ingi Gislason 57cfcc62af man8/bridge.8: fix misuse of two-fonts macros
Use a single-font macro for a single argument.

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-07-06 11:00:28 -07:00
Bjarni Ingi Gislason 2df0dc2437 libnetlink.3: display section numbers in roman font, not boldface
Typeset section numbers in roman font, see man-pages(7).

###

  Details:

Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z

  [ "test-groff" is a developmental version of "groff" ]

<./man/man3/libnetlink.3>:53 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:132 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:134 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:197 (macro BR): only 1 argument, but more are expected
<./man/man3/libnetlink.3>:198 (macro BR): only 1 argument, but more are expected

Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
2020-07-06 10:46:23 -07:00
David Ahern a5c9d01c5c Merge branch 'rdma-raw-format-dumps' into next
Leon Romanovsky  says:

====================

The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to
return the entire QP/CQ/MR data without a need from the vendor to set
each field separately.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:49 +00:00
Maor Gottlieb e2bbf737e6 rdma: Add support to get MR in raw format
Add the required support to print MR data in raw format.
Example:

$rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:37 +00:00
Maor Gottlieb 94323e9611 rdma: Add support to get CQ in raw format
Add the required support to print CQ data in raw format.
Example:

$rdma res show cq dev mlx5_2 cqn 1 -r -j
[{"ifindex":8,"ifname":"mlx5_2",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:33 +00:00
Maor Gottlieb 7c01e0fc9c rdma: Add support to get QP in raw format
Add 'raw' argument to get the resource in raw format.
When RDMA_NLDEV_ATTR_RES_RAW is set in the netlink message,
then the resource fields are in raw format, print it as byte array.

Example:
$rdma res show qp link rocep0s12f0/1 lqpn 1137 -j -r
[{"ifindex":7,"ifname":"mlx5_1","port":1,
"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...]}]

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:11:00 +00:00
Maor Gottlieb 8f23492823 rdma: update uapi headers
Update rdma_netlink.h file upto kernel commit 65959522f806
("RDMA: Add support to dump resource tracker in RAW format")

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 18:10:50 +00:00
David Ahern 79ea01927c Merge branch 'tc-qevents' into next
Petr Machata  says:

====================

To allow configuring user-defined actions as a result of inner workings of
a qdisc, a concept of qevents was recently introduced to the kernel.
Qevents are attach points for TC blocks, where filters can be put that are
executed as the packet hits well-defined points in the qdisc algorithms.
The attached blocks can be shared, in a manner similar to clsact ingress
and egress blocks, arbitrary classifiers with arbitrary actions can be put
on them, etc.

For example:

 # tc qdisc add dev eth0 root handle 1: \
	red limit 500K avpkt 1K qevent early_drop block 10
 # tc filter add block 10 \
	matchall action mirred egress mirror dev eth1

This patch set introduces the corresponding iproute2 support. Patch #1 adds
the new netlink attribute enumerators. Patch #2 adds a set of helpers to
implement qevents, and #3 adds a generic documentation to tc.8. Patch #4
then adds two new qevents to the RED qdisc: mark and early_drop.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:45:48 +00:00
Petr Machata d0e4504385 tc: q_red: Add support for qevents "mark" and "early_drop"
The "early_drop" qevent matches packets that have been early-dropped. The
"mark" qevent matches packets that have been ECN-marked.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:49 +00:00
Petr Machata 3cf51fb3c8 man: tc: Describe qevents
Add some general remarks about qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:45 +00:00
Petr Machata 01bb0bcd00 tc: Add helpers to support qevent handling
Introduce a set of helpers to make it easy to add support for qevents into
qdisc.

The idea behind this is that qevent types will be generally reused between
qdiscs, rather than each having a completely idiosyncratic set of qevents.
The qevent module holds functions for parsing, dumping and formatting of
these common qevent types, and for dispatch to the appropriate set of
handlers based on the qevent name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:37:27 +00:00
Po Liu bc4d9f982f action police: make 'mtu' could be set independently in police action
Current police action must set 'rate' and 'burst'. 'mtu' parameter
set the max frame size and could be set alone without 'rate' and 'burst'
in some situation. Offloading to hardware for example, 'mtu' could limit
the flow max frame size.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:34:04 +00:00
Po Liu 3c5570706b action police: change the print message quotes style
Change the double quotes to single quotes in fprintf message to make it
more readable.

Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:33:59 +00:00
Alexandre Cassen 30f3beea0d add support to keepalived rtm_protocol
Following inclusion in net-next, extend rtnl_rtprot_tab and rt_protos
to support Keepalived.

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 15:03:45 +00:00
David Ahern 482e463d6c Merge branch 'devlink-port-mac-addr' into next
Parav Pandit  says:

====================

Currently ip link set dev <pfndev> vf <vf_num> <param> <value> has
few below limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV is supported.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
support on a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port.

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:53 +00:00
Parav Pandit 4dca81e9a8 devlink: Support setting port function hardware address
Support setting devlink port function hardware address.

Example of a PCI VF port which supports a port function:
Set hardware address of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:32 +00:00
Parav Pandit b3adafd154 devlink: Support querying hardware address of port function
Add support to query the hardware address of function represented
by devlink port function.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:49:22 +00:00
Parav Pandit 2de449df19 devlink: Move devlink port code at start to reuse
To reuse print routines for port function in subsequent patch, move
print routine specific to devlink device at start of the file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:48:34 +00:00
David Ahern e17466e484 Update kernel headers
Update kernel headers to commit:
   e1f046704404 ("Merge branch 'qlogic-use-generic-power-management'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2020-07-05 14:33:15 +00:00
Stephen Hemminger 2f31d12a25 man/tc: remove obsolete reference to ipchains
It isn't Linux 2.2 anymore.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-24 12:13:46 -07:00
Roi Dayan 473d18e219 ip address: Fix loop initial declarations are only allowed in C99
On some distros, i.e. rhel 7.6, compilation fails with the following:

ipaddress.c: In function ‘lookup_flag_data_by_name’:
ipaddress.c:1260:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < ARRAY_SIZE(ifa_flag_data); ++i) {
  ^
ipaddress.c:1260:2: note: use option -std=c99 or -std=gnu99 to compile your code

This commit fixes the single place needed for compilation to pass.

Fixes: 9d59c86e57 ("iproute2: ip addr: Organize flag properties structurally")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 15:05:20 -07:00
Stephen Hemminger 3d66d83d25 uapi: update to magic.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:52:38 -07:00
Ido Schimmel abda1e9d2b devlink: Add 'mirror' trap action
Allow setting 'mirror' trap action for traps that support it. Extend the
devlink-trap man page and bash completion accordingly.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_query
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_query",
                "type": "control",
                "generic": true,
                "action": "mirror",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Ido Schimmel fd71244a20 devlink: Add 'control' trap type
This type is used for traps that trap control packets such as ARP
request and IGMP query to the CPU.

Example:

# devlink -jp trap show netdevsim/netdevsim10 trap igmp_v1_report
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "igmp_v1_report",
                "type": "control",
                "generic": true,
                "action": "trap",
                "group": "mc_snooping"
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:51:10 -07:00
Stephen Hemminger 12fafa27c7 devlink: update include files
Use the tool iwyu to get more complete list of includes for
all the bits used by devlink.

This should also fix build with musl libc.

Fixes: c4dfddccef ("fix JSON output of mon command")
Reported-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-11 09:49:46 -07:00
Roopa Prabhu 468f787f64 bridge: support for nexthop id in fdb entries
This patch adds support to assign a nexthop group
id to an fdb entry.

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:58 +00:00
Roopa Prabhu a56d17463c ipnexthop: support for fdb nexthops
This patch adds support to add and delete
ecmp nexthops of type fdb. Such nexthops can
be linked to vxlan fdb entries.

$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-11 15:52:29 +00:00
David Ahern 5f6f17db3b Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-06-08 14:40:54 +00:00
Stephen Hemminger e4932ae6b3 uapi: update headers
Update kernel headers from 5.8.0 merge

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-06-05 08:36:54 -07:00
Stephen Hemminger 0a5dbbeddb Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2020-06-05 08:33:29 -07:00
Ian K. Coolidge 5413a735a6 iproute2: ip addr: Add support for setting 'optimistic'
optimistic DAD is controllable via sysctl for an interface
or all interfaces on the system. This would affect addresses
added by the kernel only.

Recent kernels, however, have enabled support for adding optimistic
address via userspace. This plumbs that support.

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:33 +00:00
Ian K. Coolidge 9d59c86e57 iproute2: ip addr: Organize flag properties structurally
This creates a nice systematic way to check that the various flags are
mutable from userspace and that the address family is valid.

Mutability properties are preserved to avoid introducing any behavioral
change in this CL. However, previously, immutable flags were ignored and
fell through to this confusing error:

Error: either "local" is duplicate, or "dadfailed" is a garbage.

But now, they just warn more explicitly:

Warning: dadfailed option is not mutable from userspace
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 23:01:22 +00:00
Roman Mashak bd4b8c632e tc: report time an action was first used
Have print_tm() dump firstuse value along with install, lastuse
and expires.

v2: Resubmit after 'master' merged into next

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-31 22:51:19 +00:00
David Ahern e50290e687 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:08:27 +00:00
Tuong Lien 9a25abde3a tipc: enable printing of broadcast rcv link stats
This commit allows printing the statistics of a broadcast-receiver link
using the same tipc command, but with additional 'link' options:

$ tipc link stat show --help
Usage: tipc link stat show [ link { LINK | SUBSTRING | all } ]

With:
+ 'LINK'      : print the stats of the specific link 'LINK';
+ 'SUBSTRING' : print the stats of all the links having the 'SUBSTRING'
                in name;
+ 'all'       : print all the links' stats incl. the broadcast-receiver
                ones;

Also, a link stats can be reset in the usual way by specifying the link
name in command.

For example:

$ tipc l st sh l br
Link <broadcast-link>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:5011125 fragments:4968774/149643 bundles:38402/307061
  RX naks:781484 defs:0 dups:0
  TX naks:0 acks:0 retrans:330259
  Congestion link:50657  Send queue max:0 avg:0

Link <broadcast-link:1001001>
  Window:50 packets
  RX packets:95146 fragments:95040/1980 bundles:1/2
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:380938 defs:83962 dups:403
  TX naks:8362 acks:0 retrans:170662
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st sh l 1001002
Link <1001003:data0-1001002:data0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
  RX packets:99546 fragments:0/0 bundles:33/877
  TX packets:629 fragments:0/0 bundles:35/828
  TX profile sample:8 packets average:390 octets
  0-64:75% -256:0% -1024:0% -4096:25% -16384:0% -32768:0% -66000:0%
  RX states:488714 probes:7397 naks:0 defs:4 dups:5
  TX states:27734 probes:18016 naks:5 acks:2305 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:400546 defs:0 dups:0
  TX naks:0 acks:0 retrans:159597
  Congestion link:0  Send queue max:0 avg:0

$ tipc l st re l broadcast-link:1001002

$ tipc l st sh l broadcast-link:1001002
Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:0 fragments:0/0 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:0 dups:0
  TX naks:0 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 02:07:22 +00:00
Alexander Aring 9f91f1b7b8 lwtunnel: add support for rpl segment routing
This patch adds support for rpl segment routing settings.
Example:

ip -n ns0 -6 route add 2001::3 encap rpl segs \
fe80::c8fe:beef:cafe:cafe,fe80::c8fe:beef:cafe:beef dev lowpan0

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-27 00:03:17 +00:00
Maciej Fijalkowski 42796dcd36 tc: mqprio: reject queues count/offset pair count higher than num_tc
Provide a sanity check that will make sure whether queues count/offset
pair count will not exceed the actual number of TCs being created.

Example command that is invalid because there are 4 count/offset pairs
whereas num_tc is only 2.

 # tc qdisc add dev enp96s0f0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
queues 4@0 4@4 4@8 4@12 hw 1 mode channel

Store the parsed count/offset pair count onto a dedicated variable that
will be compared against opt.num_tc after all of the command line
arguments were parsed. Bail out if this count is higher than opt.num_tc
and let user know about it.

Drivers were swallowing such commands as they were iterating over
count/offset pairs where num_tc was used as a delimiter, so this is not
a big deal, but better catch such misconfiguration at the command line
argument parsing level.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-18 14:57:15 +00:00
Dmitry Yakunin 7bd9188581 ss: add checks for bc filter support
As noted by David Ahern, now if some bytecode filter is not supported
by running kernel printed error message is not clear. This patch is attempt to
detect such case and print correct message. This is done by providing checking
function for new filter types. As example check function for cgroup filter
is implemented. It sends correct lightweight request (idiag_states = 0)
with zero cgroup condition to the kernel and checks returned errno. If filter
is not supported EINVAL is returned. Result of checking is cached to
avoid extra checks if several same filters are specified.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:38 +00:00
Dmitry Yakunin 14f4bda590 ss: add support for cgroup v2 information and filtering
This patch introduces two new features: obtaining cgroup information and
filtering sockets by cgroups. These features work based on cgroup v2 ID
field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA).

Cgroup information can be obtained by specifying --cgroup flag and now contains
only pathname. For faster pathname lookups cgroup cache is implemented. This
cache is filled on ss startup and missed entries are resolved and saved
on the fly.

Cgroup filter extends EXPRESSION and allows to specify cgroup pathname
(relative or absolute) to obtain sockets attached only to this cgroup.
Filter syntax: ss [ cgroup PATHNAME ]
Examples:
    ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .)
    ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1)

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:35 +00:00
Dmitry Yakunin d5e6ee0dac ss: introduce cgroup2 cache and helper functions
This patch prepares infrastructure for matching sockets by cgroups.
Two helper functions are added for transformation between cgroup v2 ID
and pathname. Cgroup v2 cache is implemented as hash table indexed by ID.
This cache is needed for faster lookups of socket cgroup.

v2:
  - style fixes (David Ahern)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 14:28:04 +00:00
Po Liu 965a5f6a1b iproute2-next: add gate action man page
This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:20:12 +00:00
Po Liu 07d5ee70b5 iproute2-next:tc:action: add a gate control action
Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 parent ffff: protocol ip \

            flower src_ip 192.168.0.20 \
            action gate index 2 clockid CLOCK_TAI \
            sched-entry open 200000000ns -1 8000000b \
            sched-entry close 100000000ns

 # tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval, the overlimit frames would be dropped.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

     1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

 #tc filter add dev eth0 parent ffff:  protocol ip \
           flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
           action gate index 12 base-time 1357000000000ns \
           sched-entry CLOSE 200000000ns \
           clockid CLOCK_TAI

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:19:46 +00:00
David Ahern 0e9b227e2d Update kernel headers and import tc_gate.h
Update kernel headers to commit:
    fb9f2e92864f ("net: dsa: tag_sja1105: appease sparse checks for ethertype accessors")
and import tc_act/tc_gate.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-13 02:18:15 +00:00
Jakub Kicinski ec04b6fc24 devlink: support kernel-side snapshot id allocation
Make ID argument optional and read the snapshot info
that kernel sends us.

$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 0
$ devlink -jp region new netdevsim/netdevsim1/dummy
{
    "regions": {
        "netdevsim/netdevsim1/dummy": {
            "snapshot": [ 1 ]
        }
    }
}
$ devlink region show netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: size 32768 snapshot [0 1]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 17:10:27 +00:00
David Ahern 8c109059b5 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:49:38 +00:00
David Ahern c1b21f5286 Import rpl.h and rpl_iptunnel.h uapi headers
Import rpl.h and rpl_iptunnel.h as of kernel commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:23:14 +00:00
Davide Caratti 3175bca718 tc: full JSON support for 'bpf' filter
example using eBPF:

 # tc filter add dev dummy0 ingress bpf \
 > direct-action obj ./bpf/filter.o sec tc-ingress
 # tc  -j filter show dev dummy0 ingress | jq
 [
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0
   },
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "bpf",
     "chain": 0,
     "options": {
       "handle": "0x1",
       "bpf_name": "filter.o:[tc-ingress]",
       "direct-action": true,
       "not_in_hw": true,
       "prog": {
         "id": 101,
         "tag": "a04f5eef06a7f555",
         "jited": 1
       }
     }
   }
 ]

v2:
 - use print_nl(), thanks to Andrea Claudi
 - use print_0xhex() for filter handle, thanks to Stephen Hemminger

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:19:06 +00:00
David Ahern ae57e82da0 Update kernel headers
Update kernel headers to commit:
    354d86141796 ("Merge branch 'net-reduce-dynamic-lockdep-keys'")

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-05 16:11:22 +00:00
Xin Long 4e578c78fe tc: f_flower: add options support for erspan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev erspan1 ingress
  # tc filter add dev erspan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        erspan_opts 1:2:0:0/1:255:0:0 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev erspan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       erspan_opts 1:2:0:0/1:255:0:0
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 1 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:27 +00:00
Xin Long 93c8d5f72f tc: f_flower: add options support for vxlan
This patch is to add TCA_FLOWER_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 56155d4df8 ("tc: f_flower: add geneve option match
support to flower") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev vxlan1 ingress
  # tc filter add dev vxlan1 protocol ip parent ffff: \
      flower \
        enc_src_ip 10.0.99.192 \
        enc_dst_ip 10.0.99.193 \
        enc_key_id 11 \
        vxlan_opts 65793/4008635966 \
        ip_proto udp \
        action mirred egress redirect dev eth1
  # tc -s filter show dev vxlan1 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       eth_type ipv4
       ip_proto udp
       enc_dst_ip 10.0.99.193
       enc_src_ip 10.0.99.192
       enc_key_id 11
       vxlan_opts 65793/4008635966
       not_in_hw
         action order 1: mirred (Egress Redirect to device eth1) stolen
         index 3 ref 1 bind 1
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0

v1->v2:
  - get_u32 with base = 0 for gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:22 +00:00
Xin Long 668fd9b25d tc: m_tunnel_key: add options support for erpsan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_ERSPAN's parse and
print to implement erspan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed as version:index:dir:hwid, dir and hwid will
be parsed when version is 2, while index will be parsed when
version is 1. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip link add name erspan1 type erspan external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          erspan_opts 1:2:0:0 \
      action mirred egress redirect dev erspan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49151 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         erspan_opts 1:2:0:0
         csum pipe
           index 2 ref 1 bind 1
         ...
v1->v2:
  - no change.
v2->v3:
  - no change.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:18 +00:00
Xin Long f72c3ad00f tc: m_tunnel_key: add options support for vxlan
This patch is to add TCA_TUNNEL_KEY_ENC_OPTS_VXLAN's parse and
print to implement vxlan options support in m_tunnel_key, like
Commit 6217917a38 ("tc: m_tunnel_key: Add tunnel option support
to act_tunnel_key") for geneve options support.

Option is expressed a 32bit number for gbp only, and vxlan
doesn't support multiple options.

With this patch, users can add and dump vxlan options like:

  # ip link add name vxlan1 type vxlan dstport 0 external
  # tc qdisc add dev eth0 ingress
  # tc filter add dev eth0 protocol ip parent ffff: \
      flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
          set src_ip 10.0.99.192 \
          dst_ip 10.0.99.193 \
          dst_port 6081 \
          id 11 \
          vxlan_opts 65793 \
      action mirred egress redirect dev vxlan1
  # tc -s filter show dev eth0 parent ffff:

     filter protocol ip pref 49152 flower chain 0 handle 0x1
       indev eth0
       eth_type ipv4
       ip_proto udp
       not_in_hw
         action order 1: tunnel_key  set
         src_ip 10.0.99.192
         dst_ip 10.0.99.193
         key_id 11
         dst_port 6081
         vxlan_opts 65793
         ...

v1->v2:
  - get_u32 with base = 0 for gbp.
  - use to print_unint("0x%x") to print gbp.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:14 +00:00
Xin Long 39fa047938 iproute_lwtunnel: add options support for erspan metadata
This patch is to add LWTUNNEL_IP_OPTS_ERSPAN's parse and print to implement
erspan options support in iproute_lwtunnel.

Option is expressed as version:index:dir:hwid, dir and hwid will be parsed
when version is 2, while index will be parsed when version is 1. All of
these are numbers. erspan doesn't support multiple options.

With this patch, users can add and dump erspan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add erspan1 type erspan key 1 seq erspan 123 \
    local 10.1.0.2 remote 10.1.0.1
  # ip -n b addr add 1.1.1.1/24 dev erspan1
  # ip -n b link set erspan1 up
  # ip -n b route add 2.1.1.0/24 dev erspan1
  # ip -n a link add erspan1 type erspan key 1 seq local 10.1.0.1 external
  # ip -n a addr add 2.1.1.1/24 dev erspan1
  # ip -n a link set erspan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    erspan_opts 2:123:1:2 dst 10.1.0.2 dev erspan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     erspan_opts 2:0:1:2 dev erspan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.124 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON object for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print version, index, dir and hwid as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:09 +00:00
Xin Long b1bc0f3892 iproute_lwtunnel: add options support for vxlan metadata
This patch is to add LWTUNNEL_IP_OPTS_VXLAN's parse and print to implement
vxlan options support in iproute_lwtunnel.

Option is expressed a number for gbp only, and vxlan doesn't support
multiple options.

With this patch, users can add and dump vxlan options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up
  # ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add vxlan1 type vxlan id 1 local 10.1.0.2 \
    remote 10.1.0.1 dev eth0 ttl 64 gbp
  # ip -n b addr add 1.1.1.1/24 dev vxlan1
  # ip -n b link set vxlan1 up
  # ip -n b route add 2.1.1.0/24 dev vxlan1
  # ip -n a link add vxlan1 type vxlan local 10.1.0.1 dev eth0 ttl 64 \
    gbp external
  # ip -n a addr add 2.1.1.1/24 dev vxlan1
  # ip -n a link set vxlan1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 \
    vxlan_opts 1110 dst 10.1.0.2 dev vxlan1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     vxlan_opts 1110 dev vxlan1 scope link

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.111 ms

v1->v2:
  - improve the changelog.
  - get_u32 with base = 0 for gbp.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print gbp as uint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:33:03 +00:00
Xin Long ca7614d4c6 iproute_lwtunnel: add options support for geneve metadata
This patch is to add LWTUNNEL_IP(6)_OPTS and LWTUNNEL_IP_OPTS_GENEVE's
parse and print to implement geneve options support in iproute_lwtunnel.

Options are expressed as class:type:data and multiple options may be
listed using a comma delimiter, class and type are numbers and data
is a hex string.

With this patch, users can add and dump geneve options like:

  # ip netns add a
  # ip netns add b
  # ip -n a link add eth0 type veth peer name eth0 netns b
  # ip -n a link set eth0 up; ip -n b link set eth0 up
  # ip -n a addr add 10.1.0.1/24 dev eth0
  # ip -n b addr add 10.1.0.2/24 dev eth0
  # ip -n b link add geneve1 type geneve id 1 remote 10.1.0.1 ttl 64
  # ip -n b addr add 1.1.1.1/24 dev geneve1
  # ip -n b link set geneve1 up
  # ip -n b route add 2.1.1.0/24 dev geneve1
  # ip -n a link add geneve1 type geneve external
  # ip -n a addr add 2.1.1.1/24 dev geneve1
  # ip -n a link set geneve1 up
  # ip -n a route add 1.1.1.0/24 encap ip id 1 geneve_opts \
    1:1:1212121234567890,1:1:1212121234567890,1:1:1212121234567890 \
    dst 10.1.0.2 dev geneve1
  # ip -n a route show
  # ip netns exec a ping 1.1.1.1 -c 1

   1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 tos 0
     geneve_opts 1:1:1212121234567890,1:1:1212121234567890 ...

   PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
   64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.079 ms

v1->v2:
  - improve the changelog.
  - use PRINT_ANY to support dumping with json format.
v2->v3:
  - implement proper JSON array for opts instead of just bunch of strings.
v3->v4:
  - keep the same format between input and output, json and non json.
  - print class and type as uint and print data as hex string.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-05-01 16:31:58 +00:00
Petr Machata 081d6c310d tc: pedit: Support JSON dumping
The action pedit does not currently support dumping to JSON. Convert
print_pedit() to the print_* family of functions so that dumping is correct
both in plain and JSON mode. In plain mode, the output is character for
character the same as it was before. In JSON mode, this is an example dump:

$ tc filter add dev dummy0 ingress prio 125 flower \
         action pedit ex munge udp dport set 12345 \
	                 munge ip ttl add 1        \
			 munge offset 10 u8 clear
$ tc -j filter show dev dummy0 ingress | jq
[
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0
  },
  {
    "protocol": "all",
    "pref": 125,
    "kind": "flower",
    "chain": 0,
    "options": {
      "handle": 1,
      "keys": {},
      "not_in_hw": true,
      "actions": [
        {
          "order": 1,
          "kind": "pedit",
          "control_action": {
            "type": "pass"
          },
          "nkeys": 3,
          "index": 1,
          "ref": 1,
          "bind": 1,
          "keys": [
            {
              "htype": "udp",
              "offset": 0,
              "cmd": "set",
              "val": "3039",
              "mask": "ffff0000"
            },
            {
              "htype": "ipv4",
              "offset": 8,
              "cmd": "add",
              "val": "1000000",
              "mask": "ffffff"
            },
            {
              "htype": "network",
              "offset": 8,
              "cmd": "set",
              "val": "0",
              "mask": "ffff00ff"
            }
          ]
        }
      ]
    }
  }
]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:43:23 +00:00
William Tu 846b6b2da8 erspan: Add type I version 0 support.
The Type I ERSPAN frame format is based on the barebones
IP + GRE(4-byte) encapsulation on top of the raw mirrored frame.
Both type I and II use 0x88BE as protocol type. Unlike type II
and III, no sequence number or key is required.

To creat a type I erspan tunnel device:
$ ip link add dev erspan11 type erspan \
	local 172.16.1.100 remote 172.16.1.200 \
	erspan_ver 0

CC: Dmitriy Andreyevskiy <dandreye@cisco.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-30 02:40:10 +00:00
Paolo Abeni 0c42c6b130 man: ip.8: add reference to mptcp man-page
While at it, additionally fix a mandoc warning in mptcp.8

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 17:36:14 +00:00
David Ahern d38f2a10dd Merge branch 'mptcp' into next
Paolo Abeni  says:

====================

This introduces support for the MPTCP PM netlink interface, allowing admins
to configure several aspects of the MPTCP path manager. The subcommand is
documented with a newly added man-page.

This series also includes support for MPTCP subflow diag.

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:50:25 +00:00
Paolo Abeni 2d8b5fe93e man: mptcp man page
describe the mptcp subcommands implemented so far.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:47:45 +00:00
Davide Caratti 712fdd98c0 ss: allow dumping MPTCP subflow information
[root@f31 packetdrill]# ss -tni

 ESTAB    0        0           192.168.82.247:8080           192.0.2.1:35273
          cubic wscale:7,8 [...] tcp-ulp-mptcp flags:Mec token:0000(id:0)/5f856c60(id:0) seq:b810457db34209a5 sfseq:1 ssnoff:0 maplen:190

Additionally extends ss manpage to describe the new entry layout.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:44:55 +00:00
Paolo Abeni 7e0767cd86 add support for mptcp netlink interface
Implement basic commands to:
- manipulate MPTCP endpoints list
- manipulate MPTCP connection limits

Examples:
1. Allows multiple subflows per MPTCP connection
   $ ip mptcp limits set subflows 2

2. Accept ADD_ADDR announcement from the peer (server):
   $ ip mptcp limits set add_addr_accepted 2

3. Add a ipv4 address to be annunced for backup subflows:
   $ ip mptcp endpoint add 10.99.1.2 signal backup

4. Add an ipv6 address used as source for additional subflows:
   $ ip mptcp endpoint add 2001::2 subflow

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:43:18 +00:00
David Ahern 02ade5a8ea Update kernel headers and import mptcp.h
Update kernel headers to commit
    790ab249b55d ("net: ethernet: fec: Prevent MII event after MII_SPEED write")

and import mptcp.h

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-29 16:41:39 +00:00
David Ahern 60f1075c21 Merge branch 'macsec-offload' into next
Igor Russkikh  says:

====================

From: Mark Starovoytov <mstarovoitov@marvell.com>

This series adds support for selecting the offloading mode of a MACsec
interface at link creation time.
Available modes are for now 'off', 'phy' and 'mac', 'off' being the default
when an interface is created.

First patch adds support for MAC offloading.

Last patch allows a user to change the offloading mode at runtime
through a new attribute, `ip link add link ... offload`:

  # ip link add link enp1s0 type macsec encrypt on offload off
  # ip link add link enp1s0 type macsec encrypt on offload phy
  # ip link add link enp1s0 type macsec encrypt on offload mac

====================

Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:20 +00:00
Mark Starovoytov bcbeb35ca4 macsec: add support for specifying offload at link add time
This patch adds support for configuring offload mode upon MACsec
device creation.

If offload mode is not specified, then netlink attribute is not
added. Default behavior on the kernel side in this case is
backward-compatible (offloading is disabled by default).

Example:
$ ip link add link eth0 macsec0 type macsec port 11 encrypt on offload mac

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:32:03 +00:00
Mark Starovoytov 998534c99e macsec: add support for MAC offload
This patch enables MAC HW offload usage in iproute, since MACSec
implementation supports it now.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-26 18:31:37 +00:00
Eran Ben Elisha 4aa0c9c9f8 devlink: Add devlink health auto_dump command support
Add support for configuring auto_dump attribute per reporter.
With this attribute, one can indicate whether the devlink kernel core
should execute automatic dump on error.

The change will be reflected in show, set and man commands.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:27:13 +00:00
David Ahern 59ba1dd011 Merge branch 'master' into next
Signed-off-by: David Ahern <dsahern@gmail.com>
2020-04-19 22:26:27 +00:00
360 changed files with 28232 additions and 4046 deletions

View File

@ -1,6 +1,8 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
# Top level Makefile for iproute2 # Top level Makefile for iproute2
-include config.mk
ifeq ("$(origin V)", "command line") ifeq ("$(origin V)", "command line")
VERBOSE = $(V) VERBOSE = $(V)
endif endif
@ -13,7 +15,6 @@ MAKEFLAGS += --no-print-directory
endif endif
PREFIX?=/usr PREFIX?=/usr
LIBDIR?=$(PREFIX)/lib
SBINDIR?=/sbin SBINDIR?=/sbin
CONFDIR?=/etc/iproute2 CONFDIR?=/etc/iproute2
NETNS_RUN_DIR?=/var/run/netns NETNS_RUN_DIR?=/var/run/netns
@ -40,9 +41,18 @@ DEFINES+=-DCONFDIR=\"$(CONFDIR)\" \
-DNETNS_RUN_DIR=\"$(NETNS_RUN_DIR)\" \ -DNETNS_RUN_DIR=\"$(NETNS_RUN_DIR)\" \
-DNETNS_ETC_DIR=\"$(NETNS_ETC_DIR)\" -DNETNS_ETC_DIR=\"$(NETNS_ETC_DIR)\"
#options for AX.25
ADDLIB+=ax25_ntop.o
#options for AX.25
ADDLIB+=rose_ntop.o
#options for mpls #options for mpls
ADDLIB+=mpls_ntop.o mpls_pton.o ADDLIB+=mpls_ntop.o mpls_pton.o
#options for NETROM
ADDLIB+=netrom_ntop.o
CC := gcc CC := gcc
HOSTCC ?= $(CC) HOSTCC ?= $(CC)
DEFINES += -D_GNU_SOURCE DEFINES += -D_GNU_SOURCE
@ -55,7 +65,7 @@ WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2
CFLAGS := $(WFLAGS) $(CCOPTS) -I../include -I../include/uapi $(DEFINES) $(CFLAGS) CFLAGS := $(WFLAGS) $(CCOPTS) -I../include -I../include/uapi $(DEFINES) $(CFLAGS)
YACCFLAGS = -d -t -v YACCFLAGS = -d -t -v
SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma man SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma dcb man vdpa
LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a
LDLIBS += $(LIBNETLINK) LDLIBS += $(LIBNETLINK)
@ -63,7 +73,9 @@ LDLIBS += $(LIBNETLINK)
all: config.mk all: config.mk
@set -e; \ @set -e; \
for i in $(SUBDIRS); \ for i in $(SUBDIRS); \
do echo; echo $$i; $(MAKE) $(MFLAGS) -C $$i; done do echo; echo $$i; $(MAKE) -C $$i; done
.PHONY: clean clobber distclean check cscope version
help: help:
@echo "Make Targets:" @echo "Make Targets:"
@ -73,13 +85,15 @@ help:
@echo " install - install binaries on local machine" @echo " install - install binaries on local machine"
@echo " check - run tests" @echo " check - run tests"
@echo " cscope - build cscope database" @echo " cscope - build cscope database"
@echo " snapshot - generate version number header" @echo " version - update version"
@echo "" @echo ""
@echo "Make Arguments:" @echo "Make Arguments:"
@echo " V=[0|1] - set build verbosity level" @echo " V=[0|1] - set build verbosity level"
config.mk: config.mk:
sh configure $(KERNEL_INCLUDE) @if [ ! -f config.mk -o configure -nt config.mk ]; then \
sh configure $(KERNEL_INCLUDE); \
fi
install: all install: all
install -m 0755 -d $(DESTDIR)$(SBINDIR) install -m 0755 -d $(DESTDIR)$(SBINDIR)
@ -93,17 +107,17 @@ install: all
install -m 0644 bash-completion/devlink $(DESTDIR)$(BASH_COMPDIR) install -m 0644 bash-completion/devlink $(DESTDIR)$(BASH_COMPDIR)
install -m 0644 include/bpf_elf.h $(DESTDIR)$(HDRDIR) install -m 0644 include/bpf_elf.h $(DESTDIR)$(HDRDIR)
snapshot: version:
echo "static const char SNAPSHOT[] = \""`date +%y%m%d`"\";" \ echo "static const char version[] = \""`git describe --tags --long`"\";" \
> include/SNAPSHOT.h > include/version.h
clean: clean:
@for i in $(SUBDIRS) testsuite; \ @for i in $(SUBDIRS) testsuite; \
do $(MAKE) $(MFLAGS) -C $$i clean; done do $(MAKE) -C $$i clean; done
clobber: clobber:
touch config.mk touch config.mk
$(MAKE) $(MFLAGS) clean $(MAKE) clean
rm -f config.mk cscope.* rm -f config.mk cscope.*
distclean: clobber distclean: clobber

15
README
View File

@ -28,17 +28,12 @@ The makefile will automatically build a config.mk file which
contains definitions of libraries that may or may not be available contains definitions of libraries that may or may not be available
on the system such as: ATM, ELF, MNL, and SELINUX. on the system such as: ATM, ELF, MNL, and SELINUX.
3. To make documentation, cd to doc/ directory , then 3. include/uapi
look at start of Makefile and set correct values for
PAGESIZE=a4 , ie: a4 , letter ... (string)
PAGESPERPAGE=2 , ie: 1 , 2 ... (numeric)
and make there. It assumes, that latex, dvips and psnup
are in your path.
4. This package includes matching sanitized kernel headers because This package includes matching sanitized kernel headers because
the build environment may not have up to date versions. See Makefile the build environment may not have up to date versions. See Makefile
if you have special requirements and need to point at different if you have special requirements and need to point at different
kernel include files. kernel include files.
Stephen Hemminger Stephen Hemminger
stephen@networkplumber.org stephen@networkplumber.org

View File

@ -319,6 +319,57 @@ _devlink_port_split()
esac esac
} }
# Completion for devlink port param set
_devlink_port_param_set()
{
case $cword in
7)
COMPREPLY=( $( compgen -W "value" -- "$cur" ) )
return
;;
8)
# String argument
return
;;
9)
COMPREPLY=( $( compgen -W "cmode" -- "$cur" ) )
return
;;
10)
COMPREPLY=( $( compgen -W "runtime driverinit permanent" -- \
"$cur" ) )
return
;;
esac
}
# Completion for devlink port param
_devlink_port_param()
{
case "$cword" in
3)
COMPREPLY=( $( compgen -W "show set" -- "$cur" ) )
return
;;
4)
_devlink_direct_complete "port"
return
;;
5)
COMPREPLY=( $( compgen -W "name" -- "$cur" ) )
return
;;
6)
_devlink_direct_complete "param_name"
return
;;
esac
if [[ "${words[3]}" == "set" ]]; then
_devlink_port_param_set
fi
}
# Completion for devlink port # Completion for devlink port
_devlink_port() _devlink_port()
{ {
@ -331,6 +382,10 @@ _devlink_port()
_devlink_port_split _devlink_port_split
return return
;; ;;
param)
_devlink_port_param
return
;;
show|unsplit) show|unsplit)
if [[ $cword -eq 3 ]]; then if [[ $cword -eq 3 ]]; then
_devlink_direct_complete "port" _devlink_direct_complete "port"
@ -635,7 +690,7 @@ _devlink_health_reporter()
_devlink_health() _devlink_health()
{ {
case $command in case $command in
show|recover|diagnose|set) show|recover|diagnose|set|test)
_devlink_health_reporter 0 _devlink_health_reporter 0
if [[ $command == "set" ]]; then if [[ $command == "set" ]]; then
case $cword in case $cword in
@ -678,7 +733,7 @@ _devlink_trap_set_action()
COMPREPLY=( $( compgen -W "action" -- "$cur" ) ) COMPREPLY=( $( compgen -W "action" -- "$cur" ) )
;; ;;
$((7 + $i))) $((7 + $i)))
COMPREPLY=( $( compgen -W "trap drop" -- "$cur" ) ) COMPREPLY=( $( compgen -W "trap drop mirror" -- "$cur" ) )
;; ;;
esac esac
} }
@ -708,7 +763,7 @@ _devlink_trap_group_set()
case $prev in case $prev in
action) action)
COMPREPLY=( $( compgen -W "trap drop" -- "$cur" ) ) COMPREPLY=( $( compgen -W "trap drop mirror" -- "$cur" ) )
return return
;; ;;
policer) policer)

View File

@ -10,6 +10,11 @@ void print_vlan_info(struct rtattr *tb, int ifindex);
int print_linkinfo(struct nlmsghdr *n, void *arg); int print_linkinfo(struct nlmsghdr *n, void *arg);
int print_mdb_mon(struct nlmsghdr *n, void *arg); int print_mdb_mon(struct nlmsghdr *n, void *arg);
int print_fdb(struct nlmsghdr *n, void *arg); int print_fdb(struct nlmsghdr *n, void *arg);
void print_stp_state(__u8 state);
int parse_stp_state(const char *arg);
int print_vlan_rtm(struct nlmsghdr *n, void *arg, bool monitor,
bool global_only);
void br_print_router_port_stats(struct rtattr *pattr);
int do_fdb(int argc, char **argv); int do_fdb(int argc, char **argv);
int do_mdb(int argc, char **argv); int do_mdb(int argc, char **argv);

View File

@ -12,7 +12,7 @@
#include <string.h> #include <string.h>
#include <errno.h> #include <errno.h>
#include "SNAPSHOT.h" #include "version.h"
#include "utils.h" #include "utils.h"
#include "br_common.h" #include "br_common.h"
#include "namespace.h" #include "namespace.h"
@ -37,10 +37,10 @@ static void usage(void)
fprintf(stderr, fprintf(stderr,
"Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n" "Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
" bridge [ -force ] -batch filename\n" " bridge [ -force ] -batch filename\n"
"where OBJECT := { link | fdb | mdb | vlan | monitor }\n" "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n" " OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
" -o[neline] | -t[imestamp] | -n[etns] name |\n" " -o[neline] | -t[imestamp] | -n[etns] name |\n"
" -c[ompressvlans] -color -p[retty] -j[son] }\n"); " -c[ompressvlans] -color -p[retty] -j[son] }\n");
exit(-1); exit(-1);
} }
@ -77,20 +77,14 @@ static int do_cmd(const char *argv0, int argc, char **argv)
return -1; return -1;
} }
static int br_batch_cmd(int argc, char *argv[], void *data)
{
return do_cmd(argv[0], argc, argv);
}
static int batch(const char *name) static int batch(const char *name)
{ {
char *line = NULL; int ret;
size_t len = 0;
int ret = EXIT_SUCCESS;
if (name && strcmp(name, "-") != 0) {
if (freopen(name, "r", stdin) == NULL) {
fprintf(stderr,
"Cannot open file \"%s\" for reading: %s\n",
name, strerror(errno));
return EXIT_FAILURE;
}
}
if (rtnl_open(&rth, 0) < 0) { if (rtnl_open(&rth, 0) < 0) {
fprintf(stderr, "Cannot open rtnetlink\n"); fprintf(stderr, "Cannot open rtnetlink\n");
@ -99,25 +93,7 @@ static int batch(const char *name)
rtnl_set_strict_dump(&rth); rtnl_set_strict_dump(&rth);
cmdlineno = 0; ret = do_batch(name, force, br_batch_cmd, NULL);
while (getcmdline(&line, &len, stdin) != -1) {
char *largv[100];
int largc;
largc = makeargs(line, largv, 100);
if (largc == 0)
continue; /* blank line */
if (do_cmd(largv[0], largc, largv)) {
fprintf(stderr, "Command failed %s:%d\n",
name, cmdlineno);
ret = EXIT_FAILURE;
if (!force)
break;
}
}
if (line)
free(line);
rtnl_close(&rth); rtnl_close(&rth);
return ret; return ret;
@ -141,7 +117,7 @@ main(int argc, char **argv)
if (matches(opt, "-help") == 0) { if (matches(opt, "-help") == 0) {
usage(); usage();
} else if (matches(opt, "-Version") == 0) { } else if (matches(opt, "-Version") == 0) {
printf("bridge utility, 0.0\n"); printf("bridge utility, %s\n", version);
exit(0); exit(0);
} else if (matches(opt, "-stats") == 0 || } else if (matches(opt, "-stats") == 0 ||
matches(opt, "-statistics") == 0) { matches(opt, "-statistics") == 0) {
@ -173,9 +149,9 @@ main(int argc, char **argv)
NEXT_ARG(); NEXT_ARG();
if (netns_switch(argv[1])) if (netns_switch(argv[1]))
exit(-1); exit(-1);
} else if (matches_color(opt, &color)) {
} else if (matches(opt, "-compressvlans") == 0) { } else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans; ++compress_vlans;
} else if (matches_color(opt, &color)) {
} else if (matches(opt, "-force") == 0) { } else if (matches(opt, "-force") == 0) {
++force; ++force;
} else if (matches(opt, "-json") == 0) { } else if (matches(opt, "-json") == 0) {

View File

@ -30,19 +30,21 @@
#include "rt_names.h" #include "rt_names.h"
#include "utils.h" #include "utils.h"
static unsigned int filter_index, filter_vlan, filter_state, filter_master; static unsigned int filter_index, filter_dynamic, filter_master,
filter_state, filter_vlan;
static void usage(void) static void usage(void)
{ {
fprintf(stderr, fprintf(stderr,
"Usage: bridge fdb { add | append | del | replace } ADDR dev DEV\n" "Usage: bridge fdb { add | append | del | replace } ADDR dev DEV\n"
" [ self ] [ master ] [ use ] [ router ] [ extern_learn ]\n" " [ self ] [ master ] [ use ] [ router ] [ extern_learn ]\n"
" [ sticky ] [ local | static | dynamic ] [ dst IPADDR ]\n" " [ sticky ] [ local | static | dynamic ] [ vlan VID ]\n"
" [ vlan VID ] [ port PORT] [ vni VNI ] [ via DEV ]\n" " { [ dst IPADDR ] [ port PORT] [ vni VNI ] | [ nhid NHID ] }\n"
" [ src_vni VNI ]\n" " [ via DEV ] [ src_vni VNI ]\n"
" bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ] [ state STATE ] ]\n" " bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ]\n"
" bridge fdb get ADDR [ br BRDEV ] { brport |dev } DEV [ vlan VID ]\n" " [ state STATE ] [ dynamic ] ]\n"
" [ vni VNI ]\n"); " bridge fdb get [ to ] LLADDR [ br BRDEV ] { brport | dev } DEV\n"
" [ vlan VID ] [ vni VNI ] [ self ] [ master ] [ dynamic ]\n");
exit(-1); exit(-1);
} }
@ -62,7 +64,10 @@ static const char *state_n2a(unsigned int s)
if (s & NUD_REACHABLE) if (s & NUD_REACHABLE)
return ""; return "";
sprintf(buf, "state=%#x", s); if (is_json_context())
sprintf(buf, "%#x", s);
else
sprintf(buf, "state=%#x", s);
return buf; return buf;
} }
@ -167,6 +172,9 @@ int print_fdb(struct nlmsghdr *n, void *arg)
if (filter_vlan && filter_vlan != vid) if (filter_vlan && filter_vlan != vid)
return 0; return 0;
if (filter_dynamic && (r->ndm_state & NUD_PERMANENT))
return 0;
open_json_object(NULL); open_json_object(NULL);
if (n->nlmsg_type == RTM_DELNEIGH) if (n->nlmsg_type == RTM_DELNEIGH)
print_bool(PRINT_ANY, "deleted", "Deleted ", true); print_bool(PRINT_ANY, "deleted", "Deleted ", true);
@ -184,10 +192,13 @@ int print_fdb(struct nlmsghdr *n, void *arg)
"mac", "%s ", lladdr); "mac", "%s ", lladdr);
} }
if (!filter_index && r->ndm_ifindex) if (!filter_index && r->ndm_ifindex) {
print_string(PRINT_FP, NULL, "dev ", NULL);
print_color_string(PRINT_ANY, COLOR_IFNAME, print_color_string(PRINT_ANY, COLOR_IFNAME,
"ifname", "dev %s ", "ifname", "%s ",
ll_index_to_name(r->ndm_ifindex)); ll_index_to_name(r->ndm_ifindex));
}
if (tb[NDA_DST]) { if (tb[NDA_DST]) {
int family = AF_INET; int family = AF_INET;
@ -200,9 +211,11 @@ int print_fdb(struct nlmsghdr *n, void *arg)
RTA_PAYLOAD(tb[NDA_DST]), RTA_PAYLOAD(tb[NDA_DST]),
RTA_DATA(tb[NDA_DST])); RTA_DATA(tb[NDA_DST]));
print_string(PRINT_FP, NULL, "dst ", NULL);
print_color_string(PRINT_ANY, print_color_string(PRINT_ANY,
ifa_family_color(family), ifa_family_color(family),
"dst", "dst %s ", dst); "dst", "%s ", dst);
} }
if (vid) if (vid)
@ -237,6 +250,10 @@ int print_fdb(struct nlmsghdr *n, void *arg)
ll_index_to_name(ifindex)); ll_index_to_name(ifindex));
} }
if (tb[NDA_NH_ID])
print_uint(PRINT_ANY, "nhid", "nhid %u ",
rta_getattr_u32(tb[NDA_NH_ID]));
if (tb[NDA_LINK_NETNSID]) if (tb[NDA_LINK_NETNSID])
print_uint(PRINT_ANY, print_uint(PRINT_ANY,
"linkNetNsId", "link-netnsid %d ", "linkNetNsId", "link-netnsid %d ",
@ -322,6 +339,8 @@ static int fdb_show(int argc, char **argv)
if (state_a2n(&state, *argv)) if (state_a2n(&state, *argv))
invarg("invalid state", *argv); invarg("invalid state", *argv);
filter_state |= state; filter_state |= state;
} else if (strcmp(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else { } else {
if (matches(*argv, "help") == 0) if (matches(*argv, "help") == 0)
usage(); usage();
@ -390,6 +409,7 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
unsigned int via = 0; unsigned int via = 0;
char *endptr; char *endptr;
short vid = -1; short vid = -1;
__u32 nhid = 0;
while (argc > 0) { while (argc > 0) {
if (strcmp(*argv, "dev") == 0) { if (strcmp(*argv, "dev") == 0) {
@ -401,6 +421,10 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
duparg2("dst", *argv); duparg2("dst", *argv);
get_addr(&dst, *argv, preferred_family); get_addr(&dst, *argv, preferred_family);
dst_ok = 1; dst_ok = 1;
} else if (strcmp(*argv, "nhid") == 0) {
NEXT_ARG();
if (get_u32(&nhid, *argv, 0))
invarg("\"id\" value is invalid\n", *argv);
} else if (strcmp(*argv, "port") == 0) { } else if (strcmp(*argv, "port") == 0) {
NEXT_ARG(); NEXT_ARG();
@ -475,6 +499,11 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
return -1; return -1;
} }
if (nhid && (dst_ok || port || vni != ~0)) {
fprintf(stderr, "dst, port, vni are mutually exclusive with nhid\n");
return -1;
}
/* Assume self */ /* Assume self */
if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER))) if (!(req.ndm.ndm_flags&(NTF_SELF|NTF_MASTER)))
req.ndm.ndm_flags |= NTF_SELF; req.ndm.ndm_flags |= NTF_SELF;
@ -496,6 +525,8 @@ static int fdb_modify(int cmd, int flags, int argc, char **argv)
if (vid >= 0) if (vid >= 0)
addattr16(&req.n, sizeof(req), NDA_VLAN, vid); addattr16(&req.n, sizeof(req), NDA_VLAN, vid);
if (nhid > 0)
addattr32(&req.n, sizeof(req), NDA_NH_ID, nhid);
if (port) { if (port) {
unsigned short dport; unsigned short dport;
@ -566,6 +597,8 @@ static int fdb_get(int argc, char **argv)
duparg2("vlan", *argv); duparg2("vlan", *argv);
NEXT_ARG(); NEXT_ARG();
vlan = atoi(*argv); vlan = atoi(*argv);
} else if (matches(*argv, "dynamic") == 0) {
filter_dynamic = 1;
} else { } else {
if (strcmp(*argv, "to") == 0) if (strcmp(*argv, "to") == 0)
NEXT_ARG(); NEXT_ARG();
@ -619,10 +652,16 @@ static int fdb_get(int argc, char **argv)
if (rtnl_talk(&rth, &req.n, &answer) < 0) if (rtnl_talk(&rth, &req.n, &answer) < 0)
return -2; return -2;
/*
* Initialize a json_writer and open an array object
* if -json was specified.
*/
new_json_obj(json);
if (print_fdb(answer, stdout) < 0) { if (print_fdb(answer, stdout) < 0) {
fprintf(stderr, "An error :-)\n"); fprintf(stderr, "An error :-)\n");
return -1; return -1;
} }
delete_json_obj();
return 0; return 0;
} }

View File

@ -19,7 +19,7 @@
static unsigned int filter_index; static unsigned int filter_index;
static const char *port_states[] = { static const char *stp_states[] = {
[BR_STATE_DISABLED] = "disabled", [BR_STATE_DISABLED] = "disabled",
[BR_STATE_LISTENING] = "listening", [BR_STATE_LISTENING] = "listening",
[BR_STATE_LEARNING] = "learning", [BR_STATE_LEARNING] = "learning",
@ -68,22 +68,29 @@ static void print_link_flags(FILE *fp, unsigned int flags, unsigned int mdown)
close_json_array(PRINT_ANY, "> "); close_json_array(PRINT_ANY, "> ");
} }
static void print_portstate(__u8 state) void print_stp_state(__u8 state)
{ {
if (state <= BR_STATE_BLOCKING) if (state <= BR_STATE_BLOCKING)
print_string(PRINT_ANY, "state", print_string(PRINT_ANY, "state",
"state %s ", port_states[state]); "state %s ", stp_states[state]);
else else
print_uint(PRINT_ANY, "state", print_uint(PRINT_ANY, "state",
"state (%d) ", state); "state (%d) ", state);
} }
static void print_onoff(FILE *fp, const char *flag, __u8 val) int parse_stp_state(const char *arg)
{ {
if (is_json_context()) size_t nstates = ARRAY_SIZE(stp_states);
print_bool(PRINT_JSON, flag, NULL, val); int state;
else
fprintf(fp, "%s %s ", flag, val ? "on" : "off"); for (state = 0; state < nstates; state++)
if (strcmp(stp_states[state], arg) == 0)
break;
if (state == nstates)
state = -1;
return state;
} }
static void print_hwmode(__u16 mode) static void print_hwmode(__u16 mode)
@ -104,7 +111,7 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
parse_rtattr_nested(prtb, IFLA_BRPORT_MAX, attr); parse_rtattr_nested(prtb, IFLA_BRPORT_MAX, attr);
if (prtb[IFLA_BRPORT_STATE]) if (prtb[IFLA_BRPORT_STATE])
print_portstate(rta_getattr_u8(prtb[IFLA_BRPORT_STATE])); print_stp_state(rta_getattr_u8(prtb[IFLA_BRPORT_STATE]));
if (prtb[IFLA_BRPORT_PRIORITY]) if (prtb[IFLA_BRPORT_PRIORITY])
print_uint(PRINT_ANY, "priority", print_uint(PRINT_ANY, "priority",
@ -123,38 +130,38 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
fprintf(fp, "%s ", _SL_); fprintf(fp, "%s ", _SL_);
if (prtb[IFLA_BRPORT_MODE]) if (prtb[IFLA_BRPORT_MODE])
print_onoff(fp, "hairpin", print_on_off(PRINT_ANY, "hairpin", "hairpin %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MODE])); rta_getattr_u8(prtb[IFLA_BRPORT_MODE]));
if (prtb[IFLA_BRPORT_GUARD]) if (prtb[IFLA_BRPORT_GUARD])
print_onoff(fp, "guard", print_on_off(PRINT_ANY, "guard", "guard %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_GUARD])); rta_getattr_u8(prtb[IFLA_BRPORT_GUARD]));
if (prtb[IFLA_BRPORT_PROTECT]) if (prtb[IFLA_BRPORT_PROTECT])
print_onoff(fp, "root_block", print_on_off(PRINT_ANY, "root_block", "root_block %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_PROTECT])); rta_getattr_u8(prtb[IFLA_BRPORT_PROTECT]));
if (prtb[IFLA_BRPORT_FAST_LEAVE]) if (prtb[IFLA_BRPORT_FAST_LEAVE])
print_onoff(fp, "fastleave", print_on_off(PRINT_ANY, "fastleave", "fastleave %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_FAST_LEAVE])); rta_getattr_u8(prtb[IFLA_BRPORT_FAST_LEAVE]));
if (prtb[IFLA_BRPORT_LEARNING]) if (prtb[IFLA_BRPORT_LEARNING])
print_onoff(fp, "learning", print_on_off(PRINT_ANY, "learning", "learning %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING])); rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING]));
if (prtb[IFLA_BRPORT_LEARNING_SYNC]) if (prtb[IFLA_BRPORT_LEARNING_SYNC])
print_onoff(fp, "learning_sync", print_on_off(PRINT_ANY, "learning_sync", "learning_sync %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC])); rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC]));
if (prtb[IFLA_BRPORT_UNICAST_FLOOD]) if (prtb[IFLA_BRPORT_UNICAST_FLOOD])
print_onoff(fp, "flood", print_on_off(PRINT_ANY, "flood", "flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD])); rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_FLOOD]) if (prtb[IFLA_BRPORT_MCAST_FLOOD])
print_onoff(fp, "mcast_flood", print_on_off(PRINT_ANY, "mcast_flood", "mcast_flood %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_FLOOD])); rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_FLOOD]));
if (prtb[IFLA_BRPORT_MCAST_TO_UCAST]) if (prtb[IFLA_BRPORT_MCAST_TO_UCAST])
print_onoff(fp, "mcast_to_unicast", print_on_off(PRINT_ANY, "mcast_to_unicast", "mcast_to_unicast %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_TO_UCAST])); rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_TO_UCAST]));
if (prtb[IFLA_BRPORT_NEIGH_SUPPRESS]) if (prtb[IFLA_BRPORT_NEIGH_SUPPRESS])
print_onoff(fp, "neigh_suppress", print_on_off(PRINT_ANY, "neigh_suppress", "neigh_suppress %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_NEIGH_SUPPRESS])); rta_getattr_u8(prtb[IFLA_BRPORT_NEIGH_SUPPRESS]));
if (prtb[IFLA_BRPORT_VLAN_TUNNEL]) if (prtb[IFLA_BRPORT_VLAN_TUNNEL])
print_onoff(fp, "vlan_tunnel", print_on_off(PRINT_ANY, "vlan_tunnel", "vlan_tunnel %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL])); rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL]));
if (prtb[IFLA_BRPORT_BACKUP_PORT]) { if (prtb[IFLA_BRPORT_BACKUP_PORT]) {
int ifidx; int ifidx;
@ -166,10 +173,10 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
} }
if (prtb[IFLA_BRPORT_ISOLATED]) if (prtb[IFLA_BRPORT_ISOLATED])
print_onoff(fp, "isolated", print_on_off(PRINT_ANY, "isolated", "isolated %s ",
rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED])); rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED]));
} else } else
print_portstate(rta_getattr_u8(attr)); print_stp_state(rta_getattr_u8(attr));
} }
@ -275,22 +282,6 @@ static void usage(void)
exit(-1); exit(-1);
} }
static bool on_off(char *arg, __s8 *attr, char *val)
{
if (strcmp(val, "on") == 0)
*attr = 1;
else if (strcmp(val, "off") == 0)
*attr = 0;
else {
fprintf(stderr,
"Error: argument of \"%s\" must be \"on\" or \"off\"\n",
arg);
return false;
}
return true;
}
static int brlink_modify(int argc, char **argv) static int brlink_modify(int argc, char **argv)
{ {
struct { struct {
@ -323,6 +314,7 @@ static int brlink_modify(int argc, char **argv)
__s16 mode = -1; __s16 mode = -1;
__u16 flags = 0; __u16 flags = 0;
struct rtattr *nest; struct rtattr *nest;
int ret;
while (argc > 0) { while (argc > 0) {
if (strcmp(*argv, "dev") == 0) { if (strcmp(*argv, "dev") == 0) {
@ -330,40 +322,49 @@ static int brlink_modify(int argc, char **argv)
d = *argv; d = *argv;
} else if (strcmp(*argv, "guard") == 0) { } else if (strcmp(*argv, "guard") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("guard", &bpdu_guard, *argv)) bpdu_guard = parse_on_off("guard", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "hairpin") == 0) { } else if (strcmp(*argv, "hairpin") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("hairpin", &hairpin, *argv)) hairpin = parse_on_off("hairpin", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "fastleave") == 0) { } else if (strcmp(*argv, "fastleave") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("fastleave", &fast_leave, *argv)) fast_leave = parse_on_off("fastleave", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "root_block") == 0) { } else if (strcmp(*argv, "root_block") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("root_block", &root_block, *argv)) root_block = parse_on_off("root_block", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "learning") == 0) { } else if (strcmp(*argv, "learning") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("learning", &learning, *argv)) learning = parse_on_off("learning", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "learning_sync") == 0) { } else if (strcmp(*argv, "learning_sync") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("learning_sync", &learning_sync, *argv)) learning_sync = parse_on_off("learning_sync", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "flood") == 0) { } else if (strcmp(*argv, "flood") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("flood", &flood, *argv)) flood = parse_on_off("flood", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "mcast_flood") == 0) { } else if (strcmp(*argv, "mcast_flood") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("mcast_flood", &mcast_flood, *argv)) mcast_flood = parse_on_off("mcast_flood", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "mcast_to_unicast") == 0) { } else if (strcmp(*argv, "mcast_to_unicast") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("mcast_to_unicast", &mcast_to_unicast, *argv)) mcast_to_unicast = parse_on_off("mcast_to_unicast", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "cost") == 0) { } else if (strcmp(*argv, "cost") == 0) {
NEXT_ARG(); NEXT_ARG();
cost = atoi(*argv); cost = atoi(*argv);
@ -373,14 +374,11 @@ static int brlink_modify(int argc, char **argv)
} else if (strcmp(*argv, "state") == 0) { } else if (strcmp(*argv, "state") == 0) {
NEXT_ARG(); NEXT_ARG();
char *endptr; char *endptr;
size_t nstates = ARRAY_SIZE(port_states);
state = strtol(*argv, &endptr, 10); state = strtol(*argv, &endptr, 10);
if (!(**argv != '\0' && *endptr == '\0')) { if (!(**argv != '\0' && *endptr == '\0')) {
for (state = 0; state < nstates; state++) state = parse_stp_state(*argv);
if (strcasecmp(port_states[state], *argv) == 0) if (state == -1) {
break;
if (state == nstates) {
fprintf(stderr, fprintf(stderr,
"Error: invalid STP port state\n"); "Error: invalid STP port state\n");
return -1; return -1;
@ -404,18 +402,19 @@ static int brlink_modify(int argc, char **argv)
flags |= BRIDGE_FLAGS_MASTER; flags |= BRIDGE_FLAGS_MASTER;
} else if (strcmp(*argv, "neigh_suppress") == 0) { } else if (strcmp(*argv, "neigh_suppress") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("neigh_suppress", &neigh_suppress, neigh_suppress = parse_on_off("neigh_suppress", *argv, &ret);
*argv)) if (ret)
return -1; return ret;
} else if (strcmp(*argv, "vlan_tunnel") == 0) { } else if (strcmp(*argv, "vlan_tunnel") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("vlan_tunnel", &vlan_tunnel, vlan_tunnel = parse_on_off("vlan_tunnel", *argv, &ret);
*argv)) if (ret)
return -1; return ret;
} else if (strcmp(*argv, "isolated") == 0) { } else if (strcmp(*argv, "isolated") == 0) {
NEXT_ARG(); NEXT_ARG();
if (!on_off("isolated", &isolated, *argv)) isolated = parse_on_off("isolated", *argv, &ret);
return -1; if (ret)
return ret;
} else if (strcmp(*argv, "backup_port") == 0) { } else if (strcmp(*argv, "backup_port") == 0) {
NEXT_ARG(); NEXT_ARG();
backup_port_idx = ll_name_to_index(*argv); backup_port_idx = ll_name_to_index(*argv);

View File

@ -16,9 +16,9 @@
#include <arpa/inet.h> #include <arpa/inet.h>
#include "libnetlink.h" #include "libnetlink.h"
#include "utils.h"
#include "br_common.h" #include "br_common.h"
#include "rt_names.h" #include "rt_names.h"
#include "utils.h"
#include "json_print.h" #include "json_print.h"
#ifndef MDBA_RTA #ifndef MDBA_RTA
@ -31,7 +31,7 @@ static unsigned int filter_index, filter_vlan;
static void usage(void) static void usage(void)
{ {
fprintf(stderr, fprintf(stderr,
"Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp] [vid VID]\n" "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [src SOURCE] [permanent | temp] [vid VID]\n"
" bridge mdb {show} [ dev DEV ] [ vid VID ]\n"); " bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
exit(-1); exit(-1);
} }
@ -41,20 +41,25 @@ static bool is_temp_mcast_rtr(__u8 type)
return type == MDB_RTR_TYPE_TEMP_QUERY || type == MDB_RTR_TYPE_TEMP; return type == MDB_RTR_TYPE_TEMP_QUERY || type == MDB_RTR_TYPE_TEMP;
} }
static const char *format_timer(__u32 ticks) static const char *format_timer(__u32 ticks, int align)
{ {
struct timeval tv; struct timeval tv;
static char tbuf[32]; static char tbuf[32];
__jiffies_to_tv(&tv, ticks); __jiffies_to_tv(&tv, ticks);
snprintf(tbuf, sizeof(tbuf), "%4lu.%.2lu", if (align)
(unsigned long)tv.tv_sec, snprintf(tbuf, sizeof(tbuf), "%4lu.%.2lu",
(unsigned long)tv.tv_usec / 10000); (unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
else
snprintf(tbuf, sizeof(tbuf), "%lu.%.2lu",
(unsigned long)tv.tv_sec,
(unsigned long)tv.tv_usec / 10000);
return tbuf; return tbuf;
} }
static void __print_router_port_stats(FILE *f, struct rtattr *pattr) void br_print_router_port_stats(struct rtattr *pattr)
{ {
struct rtattr *tb[MDBA_ROUTER_PATTR_MAX + 1]; struct rtattr *tb[MDBA_ROUTER_PATTR_MAX + 1];
@ -65,7 +70,7 @@ static void __print_router_port_stats(FILE *f, struct rtattr *pattr)
__u32 timer = rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]); __u32 timer = rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s", print_string(PRINT_ANY, "timer", " %s",
format_timer(timer)); format_timer(timer, 1));
} }
if (tb[MDBA_ROUTER_PATTR_TYPE]) { if (tb[MDBA_ROUTER_PATTR_TYPE]) {
@ -96,13 +101,13 @@ static void br_print_router_ports(FILE *f, struct rtattr *attr,
print_string(PRINT_JSON, "port", NULL, port_ifname); print_string(PRINT_JSON, "port", NULL, port_ifname);
if (show_stats) if (show_stats)
__print_router_port_stats(f, i); br_print_router_port_stats(i);
close_json_object(); close_json_object();
} else if (show_stats) { } else if (show_stats) {
fprintf(f, "router ports on %s: %s", fprintf(f, "router ports on %s: %s",
brifname, port_ifname); brifname, port_ifname);
__print_router_port_stats(f, i); br_print_router_port_stats(i);
fprintf(f, "\n"); fprintf(f, "\n");
} else { } else {
fprintf(f, "%s ", port_ifname); fprintf(f, "%s ", port_ifname);
@ -115,20 +120,53 @@ static void br_print_router_ports(FILE *f, struct rtattr *attr,
close_json_array(PRINT_JSON, NULL); close_json_array(PRINT_JSON, NULL);
} }
static void print_src_entry(struct rtattr *src_attr, int af, const char *sep)
{
struct rtattr *stb[MDBA_MDB_SRCATTR_MAX + 1];
SPRINT_BUF(abuf);
const char *addr;
__u32 timer_val;
parse_rtattr_nested(stb, MDBA_MDB_SRCATTR_MAX, src_attr);
if (!stb[MDBA_MDB_SRCATTR_ADDRESS] || !stb[MDBA_MDB_SRCATTR_TIMER])
return;
addr = inet_ntop(af, RTA_DATA(stb[MDBA_MDB_SRCATTR_ADDRESS]), abuf,
sizeof(abuf));
if (!addr)
return;
timer_val = rta_getattr_u32(stb[MDBA_MDB_SRCATTR_TIMER]);
open_json_object(NULL);
print_string(PRINT_FP, NULL, "%s", sep);
print_color_string(PRINT_ANY, ifa_family_color(af),
"address", "%s", addr);
print_string(PRINT_ANY, "timer", "/%s", format_timer(timer_val, 0));
close_json_object();
}
static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e, static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
struct nlmsghdr *n, struct rtattr **tb) struct nlmsghdr *n, struct rtattr **tb)
{ {
const void *grp, *src;
const char *addr;
SPRINT_BUF(abuf); SPRINT_BUF(abuf);
const char *dev; const char *dev;
const void *src;
int af; int af;
if (filter_vlan && e->vid != filter_vlan) if (filter_vlan && e->vid != filter_vlan)
return; return;
af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6; if (!e->addr.proto) {
src = af == AF_INET ? (const void *)&e->addr.u.ip4 : af = AF_PACKET;
(const void *)&e->addr.u.ip6; grp = &e->addr.u.mac_addr;
} else if (e->addr.proto == htons(ETH_P_IP)) {
af = AF_INET;
grp = &e->addr.u.ip4;
} else {
af = AF_INET6;
grp = &e->addr.u.ip6;
}
dev = ll_index_to_name(ifindex); dev = ll_index_to_name(ifindex);
open_json_object(NULL); open_json_object(NULL);
@ -138,16 +176,64 @@ static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
print_string(PRINT_ANY, "port", " port %s", print_string(PRINT_ANY, "port", " port %s",
ll_index_to_name(e->ifindex)); ll_index_to_name(e->ifindex));
print_color_string(PRINT_ANY, ifa_family_color(af), /* The ETH_ALEN argument is ignored for all cases but AF_PACKET */
"grp", " grp %s", addr = rt_addr_n2a_r(af, ETH_ALEN, grp, abuf, sizeof(abuf));
inet_ntop(af, src, abuf, sizeof(abuf))); if (!addr)
return;
print_color_string(PRINT_ANY, ifa_family_color(af),
"grp", " grp %s", addr);
if (tb && tb[MDBA_MDB_EATTR_SOURCE]) {
src = (const void *)RTA_DATA(tb[MDBA_MDB_EATTR_SOURCE]);
print_color_string(PRINT_ANY, ifa_family_color(af),
"src", " src %s",
inet_ntop(af, src, abuf, sizeof(abuf)));
}
print_string(PRINT_ANY, "state", " %s", print_string(PRINT_ANY, "state", " %s",
(e->state & MDB_PERMANENT) ? "permanent" : "temp"); (e->state & MDB_PERMANENT) ? "permanent" : "temp");
if (show_details && tb) {
if (tb[MDBA_MDB_EATTR_GROUP_MODE]) {
__u8 mode = rta_getattr_u8(tb[MDBA_MDB_EATTR_GROUP_MODE]);
print_string(PRINT_ANY, "filter_mode", " filter_mode %s",
mode == MCAST_INCLUDE ? "include" :
"exclude");
}
if (tb[MDBA_MDB_EATTR_SRC_LIST]) {
struct rtattr *i, *attr = tb[MDBA_MDB_EATTR_SRC_LIST];
const char *sep = " ";
int rem;
open_json_array(PRINT_ANY, is_json_context() ?
"source_list" :
" source_list");
rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem);
i = RTA_NEXT(i, rem)) {
print_src_entry(i, af, sep);
sep = ",";
}
close_json_array(PRINT_JSON, NULL);
}
if (tb[MDBA_MDB_EATTR_RTPROT]) {
__u8 rtprot = rta_getattr_u8(tb[MDBA_MDB_EATTR_RTPROT]);
SPRINT_BUF(rtb);
print_string(PRINT_ANY, "protocol", " proto %s ",
rtnl_rtprot_n2a(rtprot, rtb, sizeof(rtb)));
}
}
open_json_array(PRINT_JSON, "flags"); open_json_array(PRINT_JSON, "flags");
if (e->flags & MDB_FLAGS_OFFLOAD) if (e->flags & MDB_FLAGS_OFFLOAD)
print_string(PRINT_ANY, NULL, " %s", "offload"); print_string(PRINT_ANY, NULL, " %s", "offload");
if (e->flags & MDB_FLAGS_FAST_LEAVE)
print_string(PRINT_ANY, NULL, " %s", "fast_leave");
if (e->flags & MDB_FLAGS_STAR_EXCL)
print_string(PRINT_ANY, NULL, " %s", "added_by_star_ex");
if (e->flags & MDB_FLAGS_BLOCKED)
print_string(PRINT_ANY, NULL, " %s", "blocked");
close_json_array(PRINT_JSON, NULL); close_json_array(PRINT_JSON, NULL);
if (e->vid) if (e->vid)
@ -157,7 +243,7 @@ static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
__u32 timer = rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]); __u32 timer = rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]);
print_string(PRINT_ANY, "timer", " %s", print_string(PRINT_ANY, "timer", " %s",
format_timer(timer)); format_timer(timer, 1));
} }
print_nl(); print_nl();
@ -175,8 +261,9 @@ static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr,
rem = RTA_PAYLOAD(attr); rem = RTA_PAYLOAD(attr);
for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) { for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
e = RTA_DATA(i); e = RTA_DATA(i);
parse_rtattr(etb, MDBA_MDB_EATTR_MAX, MDB_RTA(RTA_DATA(i)), parse_rtattr_flags(etb, MDBA_MDB_EATTR_MAX, MDB_RTA(RTA_DATA(i)),
RTA_PAYLOAD(i) - RTA_ALIGN(sizeof(*e))); RTA_PAYLOAD(i) - RTA_ALIGN(sizeof(*e)),
NLA_F_NESTED);
print_mdb_entry(f, ifindex, e, n, etb); print_mdb_entry(f, ifindex, e, n, etb);
} }
} }
@ -366,6 +453,25 @@ static int mdb_show(int argc, char **argv)
return 0; return 0;
} }
static int mdb_parse_grp(const char *grp, struct br_mdb_entry *e)
{
if (inet_pton(AF_INET, grp, &e->addr.u.ip4)) {
e->addr.proto = htons(ETH_P_IP);
return 0;
}
if (inet_pton(AF_INET6, grp, &e->addr.u.ip6)) {
e->addr.proto = htons(ETH_P_IPV6);
return 0;
}
if (ll_addr_a2n((char *)e->addr.u.mac_addr, sizeof(e->addr.u.mac_addr),
grp) == ETH_ALEN) {
e->addr.proto = 0;
return 0;
}
return -1;
}
static int mdb_modify(int cmd, int flags, int argc, char **argv) static int mdb_modify(int cmd, int flags, int argc, char **argv)
{ {
struct { struct {
@ -378,8 +484,8 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
.n.nlmsg_type = cmd, .n.nlmsg_type = cmd,
.bpm.family = PF_BRIDGE, .bpm.family = PF_BRIDGE,
}; };
char *d = NULL, *p = NULL, *grp = NULL, *src = NULL;
struct br_mdb_entry entry = {}; struct br_mdb_entry entry = {};
char *d = NULL, *p = NULL, *grp = NULL;
short vid = 0; short vid = 0;
while (argc > 0) { while (argc > 0) {
@ -400,6 +506,9 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
} else if (strcmp(*argv, "vid") == 0) { } else if (strcmp(*argv, "vid") == 0) {
NEXT_ARG(); NEXT_ARG();
vid = atoi(*argv); vid = atoi(*argv);
} else if (strcmp(*argv, "src") == 0) {
NEXT_ARG();
src = *argv;
} else { } else {
if (matches(*argv, "help") == 0) if (matches(*argv, "help") == 0)
usage(); usage();
@ -420,17 +529,31 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
if (!entry.ifindex) if (!entry.ifindex)
return nodev(p); return nodev(p);
if (!inet_pton(AF_INET, grp, &entry.addr.u.ip4)) { if (mdb_parse_grp(grp, &entry)) {
if (!inet_pton(AF_INET6, grp, &entry.addr.u.ip6)) { fprintf(stderr, "Invalid address \"%s\"\n", grp);
fprintf(stderr, "Invalid address \"%s\"\n", grp); return -1;
return -1; }
} else
entry.addr.proto = htons(ETH_P_IPV6);
} else
entry.addr.proto = htons(ETH_P_IP);
entry.vid = vid; entry.vid = vid;
addattr_l(&req.n, sizeof(req), MDBA_SET_ENTRY, &entry, sizeof(entry)); addattr_l(&req.n, sizeof(req), MDBA_SET_ENTRY, &entry, sizeof(entry));
if (src) {
struct rtattr *nest = addattr_nest(&req.n, sizeof(req),
MDBA_SET_ENTRY_ATTRS);
struct in6_addr src_ip6;
__be32 src_ip4;
nest->rta_type |= NLA_F_NESTED;
if (!inet_pton(AF_INET, src, &src_ip4)) {
if (!inet_pton(AF_INET6, src, &src_ip6)) {
fprintf(stderr, "Invalid source address \"%s\"\n", src);
return -1;
}
addattr_l(&req.n, sizeof(req), MDBE_ATTR_SOURCE, &src_ip6, sizeof(src_ip6));
} else {
addattr32(&req.n, sizeof(req), MDBE_ATTR_SOURCE, src_ip4);
}
addattr_nest_end(&req.n, nest);
}
if (rtnl_talk(&rth, &req.n, NULL) < 0) if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1; return -1;

View File

@ -31,7 +31,7 @@ static int prefix_banner;
static void usage(void) static void usage(void)
{ {
fprintf(stderr, "Usage: bridge monitor [file | link | fdb | mdb | all]\n"); fprintf(stderr, "Usage: bridge monitor [file | link | fdb | mdb | vlan | all]\n");
exit(-1); exit(-1);
} }
@ -67,6 +67,12 @@ static int accept_msg(struct rtnl_ctrl_data *ctrl,
print_nlmsg_timestamp(fp, n); print_nlmsg_timestamp(fp, n);
return 0; return 0;
case RTM_NEWVLAN:
case RTM_DELVLAN:
if (prefix_banner)
fprintf(fp, "[VLAN]");
return print_vlan_rtm(n, arg, true, false);
default: default:
return 0; return 0;
} }
@ -79,6 +85,7 @@ int do_monitor(int argc, char **argv)
int llink = 0; int llink = 0;
int lneigh = 0; int lneigh = 0;
int lmdb = 0; int lmdb = 0;
int lvlan = 0;
rtnl_close(&rth); rtnl_close(&rth);
@ -95,8 +102,12 @@ int do_monitor(int argc, char **argv)
} else if (matches(*argv, "mdb") == 0) { } else if (matches(*argv, "mdb") == 0) {
lmdb = 1; lmdb = 1;
groups = 0; groups = 0;
} else if (matches(*argv, "vlan") == 0) {
lvlan = 1;
groups = 0;
} else if (strcmp(*argv, "all") == 0) { } else if (strcmp(*argv, "all") == 0) {
groups = ~RTMGRP_TC; groups = ~RTMGRP_TC;
lvlan = 1;
prefix_banner = 1; prefix_banner = 1;
} else if (matches(*argv, "help") == 0) { } else if (matches(*argv, "help") == 0) {
usage(); usage();
@ -134,6 +145,12 @@ int do_monitor(int argc, char **argv)
if (rtnl_open(&rth, groups) < 0) if (rtnl_open(&rth, groups) < 0)
exit(1); exit(1);
if (lvlan && rtnl_add_nl_group(&rth, RTNLGRP_BRVLAN) < 0) {
fprintf(stderr, "Failed to add bridge vlan group to list\n");
exit(1);
}
ll_init_map(&rth); ll_init_map(&rth);
if (rtnl_listen(&rth, accept_msg, stdout) < 0) if (rtnl_listen(&rth, accept_msg, stdout) < 0)

View File

@ -9,6 +9,7 @@
#include <linux/if_bridge.h> #include <linux/if_bridge.h>
#include <linux/if_ether.h> #include <linux/if_ether.h>
#include <string.h> #include <string.h>
#include <errno.h>
#include "json_print.h" #include "json_print.h"
#include "libnetlink.h" #include "libnetlink.h"
@ -16,6 +17,7 @@
#include "utils.h" #include "utils.h"
static unsigned int filter_index, filter_vlan; static unsigned int filter_index, filter_vlan;
static int vlan_rtm_cur_ifidx = -1;
enum vlan_show_subject { enum vlan_show_subject {
VLAN_SHOW_VLAN, VLAN_SHOW_VLAN,
@ -33,8 +35,24 @@ static void usage(void)
"Usage: bridge vlan { add | del } vid VLAN_ID dev DEV [ tunnel_info id TUNNEL_ID ]\n" "Usage: bridge vlan { add | del } vid VLAN_ID dev DEV [ tunnel_info id TUNNEL_ID ]\n"
" [ pvid ] [ untagged ]\n" " [ pvid ] [ untagged ]\n"
" [ self ] [ master ]\n" " [ self ] [ master ]\n"
" bridge vlan { set } vid VLAN_ID dev DEV [ state STP_STATE ]\n"
" [ mcast_router MULTICAST_ROUTER ]\n"
" bridge vlan { show } [ dev DEV ] [ vid VLAN_ID ]\n" " bridge vlan { show } [ dev DEV ] [ vid VLAN_ID ]\n"
" bridge vlan { tunnelshow } [ dev DEV ] [ vid VLAN_ID ]\n"); " bridge vlan { tunnelshow } [ dev DEV ] [ vid VLAN_ID ]\n"
" bridge vlan global { set } vid VLAN_ID dev DEV\n"
" [ mcast_snooping MULTICAST_SNOOPING ]\n"
" [ mcast_querier MULTICAST_QUERIER ]\n"
" [ mcast_igmp_version IGMP_VERSION ]\n"
" [ mcast_mld_version MLD_VERSION ]\n"
" [ mcast_last_member_count LAST_MEMBER_COUNT ]\n"
" [ mcast_last_member_interval LAST_MEMBER_INTERVAL ]\n"
" [ mcast_startup_query_count STARTUP_QUERY_COUNT ]\n"
" [ mcast_startup_query_interval STARTUP_QUERY_INTERVAL ]\n"
" [ mcast_membership_interval MEMBERSHIP_INTERVAL ]\n"
" [ mcast_querier_interval QUERIER_INTERVAL ]\n"
" [ mcast_query_interval QUERY_INTERVAL ]\n"
" [ mcast_query_response_interval QUERY_RESPONSE_INTERVAL ]\n"
" bridge vlan global { show } [ dev DEV ] [ vid VLAN_ID ]\n");
exit(-1); exit(-1);
} }
@ -241,6 +259,277 @@ static int vlan_modify(int cmd, int argc, char **argv)
return 0; return 0;
} }
static int vlan_option_set(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct br_vlan_msg bvm;
char buf[1024];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct br_vlan_msg)),
.n.nlmsg_flags = NLM_F_REQUEST,
.n.nlmsg_type = RTM_NEWVLAN,
.bvm.family = PF_BRIDGE,
};
struct bridge_vlan_info vinfo = {};
struct rtattr *afspec;
char *d = NULL;
short vid = -1;
afspec = addattr_nest(&req.n, sizeof(req), BRIDGE_VLANDB_ENTRY);
afspec->rta_type |= NLA_F_NESTED;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
req.bvm.ifindex = ll_name_to_index(d);
if (req.bvm.ifindex == 0) {
fprintf(stderr,
"Cannot find network device \"%s\"\n",
d);
return -1;
}
} else if (strcmp(*argv, "vid") == 0) {
short vid_end = -1;
char *p;
NEXT_ARG();
p = strchr(*argv, '-');
if (p) {
*p = '\0';
p++;
vid = atoi(*argv);
vid_end = atoi(p);
if (vid >= vid_end || vid_end >= 4096) {
fprintf(stderr, "Invalid VLAN range \"%hu-%hu\"\n",
vid, vid_end);
return -1;
}
} else {
vid = atoi(*argv);
}
if (vid >= 4096) {
fprintf(stderr, "Invalid VLAN ID \"%hu\"\n",
vid);
return -1;
}
vinfo.flags = BRIDGE_VLAN_INFO_ONLY_OPTS;
vinfo.vid = vid;
addattr_l(&req.n, sizeof(req), BRIDGE_VLANDB_ENTRY_INFO,
&vinfo, sizeof(vinfo));
if (vid_end != -1)
addattr16(&req.n, sizeof(req),
BRIDGE_VLANDB_ENTRY_RANGE, vid_end);
} else if (strcmp(*argv, "state") == 0) {
char *endptr;
int state;
NEXT_ARG();
state = strtol(*argv, &endptr, 10);
if (!(**argv != '\0' && *endptr == '\0'))
state = parse_stp_state(*argv);
if (state == -1) {
fprintf(stderr, "Error: invalid STP state\n");
return -1;
}
addattr8(&req.n, sizeof(req), BRIDGE_VLANDB_ENTRY_STATE,
state);
} else if (strcmp(*argv, "mcast_router") == 0) {
__u8 mcast_router;
NEXT_ARG();
if (get_u8(&mcast_router, *argv, 0))
invarg("invalid mcast_router", *argv);
addattr8(&req.n, sizeof(req),
BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
mcast_router);
} else {
if (matches(*argv, "help") == 0)
NEXT_ARG();
}
argc--; argv++;
}
addattr_nest_end(&req.n, afspec);
if (d == NULL || vid == -1) {
fprintf(stderr, "Device and VLAN ID are required arguments.\n");
return -1;
}
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;
}
static int vlan_global_option_set(int argc, char **argv)
{
struct {
struct nlmsghdr n;
struct br_vlan_msg bvm;
char buf[1024];
} req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct br_vlan_msg)),
.n.nlmsg_flags = NLM_F_REQUEST,
.n.nlmsg_type = RTM_NEWVLAN,
.bvm.family = PF_BRIDGE,
};
struct rtattr *afspec;
short vid_end = -1;
char *d = NULL;
short vid = -1;
__u64 val64;
__u32 val32;
__u8 val8;
afspec = addattr_nest(&req.n, sizeof(req),
BRIDGE_VLANDB_GLOBAL_OPTIONS);
afspec->rta_type |= NLA_F_NESTED;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;
req.bvm.ifindex = ll_name_to_index(d);
if (req.bvm.ifindex == 0) {
fprintf(stderr, "Cannot find network device \"%s\"\n",
d);
return -1;
}
} else if (strcmp(*argv, "vid") == 0) {
char *p;
NEXT_ARG();
p = strchr(*argv, '-');
if (p) {
*p = '\0';
p++;
vid = atoi(*argv);
vid_end = atoi(p);
if (vid >= vid_end || vid_end >= 4096) {
fprintf(stderr, "Invalid VLAN range \"%hu-%hu\"\n",
vid, vid_end);
return -1;
}
} else {
vid = atoi(*argv);
}
if (vid >= 4096) {
fprintf(stderr, "Invalid VLAN ID \"%hu\"\n",
vid);
return -1;
}
addattr16(&req.n, sizeof(req), BRIDGE_VLANDB_GOPTS_ID,
vid);
if (vid_end != -1)
addattr16(&req.n, sizeof(req),
BRIDGE_VLANDB_GOPTS_RANGE, vid_end);
} else if (strcmp(*argv, "mcast_snooping") == 0) {
NEXT_ARG();
if (get_u8(&val8, *argv, 0))
invarg("invalid mcast_snooping", *argv);
addattr8(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_SNOOPING, val8);
} else if (strcmp(*argv, "mcast_querier") == 0) {
NEXT_ARG();
if (get_u8(&val8, *argv, 0))
invarg("invalid mcast_querier", *argv);
addattr8(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER, val8);
} else if (strcmp(*argv, "mcast_igmp_version") == 0) {
NEXT_ARG();
if (get_u8(&val8, *argv, 0))
invarg("invalid mcast_igmp_version", *argv);
addattr8(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION, val8);
} else if (strcmp(*argv, "mcast_mld_version") == 0) {
NEXT_ARG();
if (get_u8(&val8, *argv, 0))
invarg("invalid mcast_mld_version", *argv);
addattr8(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION, val8);
} else if (strcmp(*argv, "mcast_last_member_count") == 0) {
NEXT_ARG();
if (get_u32(&val32, *argv, 0))
invarg("invalid mcast_last_member_count", *argv);
addattr32(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_CNT,
val32);
} else if (strcmp(*argv, "mcast_startup_query_count") == 0) {
NEXT_ARG();
if (get_u32(&val32, *argv, 0))
invarg("invalid mcast_startup_query_count",
*argv);
addattr32(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT,
val32);
} else if (strcmp(*argv, "mcast_last_member_interval") == 0) {
NEXT_ARG();
if (get_u64(&val64, *argv, 0))
invarg("invalid mcast_last_member_interval",
*argv);
addattr64(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_INTVL,
val64);
} else if (strcmp(*argv, "mcast_membership_interval") == 0) {
NEXT_ARG();
if (get_u64(&val64, *argv, 0))
invarg("invalid mcast_membership_interval",
*argv);
addattr64(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_MEMBERSHIP_INTVL,
val64);
} else if (strcmp(*argv, "mcast_querier_interval") == 0) {
NEXT_ARG();
if (get_u64(&val64, *argv, 0))
invarg("invalid mcast_querier_interval",
*argv);
addattr64(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_INTVL,
val64);
} else if (strcmp(*argv, "mcast_query_interval") == 0) {
NEXT_ARG();
if (get_u64(&val64, *argv, 0))
invarg("invalid mcast_query_interval",
*argv);
addattr64(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_QUERY_INTVL,
val64);
} else if (strcmp(*argv, "mcast_query_response_interval") == 0) {
NEXT_ARG();
if (get_u64(&val64, *argv, 0))
invarg("invalid mcast_query_response_interval",
*argv);
addattr64(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_QUERY_RESPONSE_INTVL,
val64);
} else if (strcmp(*argv, "mcast_startup_query_interval") == 0) {
NEXT_ARG();
if (get_u64(&val64, *argv, 0))
invarg("invalid mcast_startup_query_interval",
*argv);
addattr64(&req.n, 1024,
BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_INTVL,
val64);
} else {
if (strcmp(*argv, "help") == 0)
NEXT_ARG();
}
argc--; argv++;
}
addattr_nest_end(&req.n, afspec);
if (d == NULL || vid == -1) {
fprintf(stderr, "Device and VLAN ID are required arguments.\n");
return -1;
}
if (rtnl_talk(&rth, &req.n, NULL) < 0)
return -1;
return 0;
}
/* In order to use this function for both filtering and non-filtering cases /* In order to use this function for both filtering and non-filtering cases
* we need to make it a tristate: * we need to make it a tristate:
* return -1 - if filtering we've gone over so don't continue * return -1 - if filtering we've gone over so don't continue
@ -422,14 +711,8 @@ static void print_vlan_flags(__u16 flags)
close_json_array(PRINT_JSON, NULL); close_json_array(PRINT_JSON, NULL);
} }
static void print_one_vlan_stats(const struct bridge_vlan_xstats *vstats) static void __print_one_vlan_stats(const struct bridge_vlan_xstats *vstats)
{ {
open_json_object(NULL);
print_hu(PRINT_ANY, "vid", "%hu", vstats->vid);
print_vlan_flags(vstats->flags);
print_nl();
print_string(PRINT_FP, NULL, "%-" __stringify(IFNAMSIZ) "s ", ""); print_string(PRINT_FP, NULL, "%-" __stringify(IFNAMSIZ) "s ", "");
print_lluint(PRINT_ANY, "rx_bytes", "RX: %llu bytes", print_lluint(PRINT_ANY, "rx_bytes", "RX: %llu bytes",
vstats->rx_bytes); vstats->rx_bytes);
@ -441,6 +724,16 @@ static void print_one_vlan_stats(const struct bridge_vlan_xstats *vstats)
vstats->tx_bytes); vstats->tx_bytes);
print_lluint(PRINT_ANY, "tx_packets", " %llu packets\n", print_lluint(PRINT_ANY, "tx_packets", " %llu packets\n",
vstats->tx_packets); vstats->tx_packets);
}
static void print_one_vlan_stats(const struct bridge_vlan_xstats *vstats)
{
open_json_object(NULL);
print_hu(PRINT_ANY, "vid", "%hu", vstats->vid);
print_vlan_flags(vstats->flags);
print_nl();
__print_one_vlan_stats(vstats);
close_json_object(); close_json_object();
} }
@ -521,6 +814,288 @@ static int print_vlan_stats(struct nlmsghdr *n, void *arg)
return 0; return 0;
} }
static void print_vlan_router_ports(struct rtattr *rattr)
{
int rem = RTA_PAYLOAD(rattr);
struct rtattr *i;
print_string(PRINT_FP, NULL, "%-" __stringify(IFNAMSIZ) "s ", "");
open_json_array(PRINT_ANY, is_json_context() ? "router_ports" :
"router ports: ");
for (i = RTA_DATA(rattr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
uint32_t *port_ifindex = RTA_DATA(i);
const char *port_ifname = ll_index_to_name(*port_ifindex);
open_json_object(NULL);
if (show_stats && i != RTA_DATA(rattr)) {
print_nl();
/* start: IFNAMSIZ + 4 + strlen("router ports: ") */
print_string(PRINT_FP, NULL,
"%-" __stringify(IFNAMSIZ) "s "
" ",
"");
}
print_string(PRINT_ANY, "port", "%s ", port_ifname);
if (show_stats)
br_print_router_port_stats(i);
close_json_object();
}
close_json_array(PRINT_JSON, NULL);
print_nl();
}
static void print_vlan_global_opts(struct rtattr *a, int ifindex)
{
struct rtattr *vtb[BRIDGE_VLANDB_GOPTS_MAX + 1], *vattr;
__u16 vid, vrange = 0;
if ((a->rta_type & NLA_TYPE_MASK) != BRIDGE_VLANDB_GLOBAL_OPTIONS)
return;
parse_rtattr_flags(vtb, BRIDGE_VLANDB_GOPTS_MAX, RTA_DATA(a),
RTA_PAYLOAD(a), NLA_F_NESTED);
vid = rta_getattr_u16(vtb[BRIDGE_VLANDB_GOPTS_ID]);
if (vtb[BRIDGE_VLANDB_GOPTS_RANGE])
vrange = rta_getattr_u16(vtb[BRIDGE_VLANDB_GOPTS_RANGE]);
else
vrange = vid;
if (filter_vlan && (filter_vlan < vid || filter_vlan > vrange))
return;
if (vlan_rtm_cur_ifidx != ifindex) {
open_vlan_port(ifindex, VLAN_SHOW_VLAN);
open_json_object(NULL);
vlan_rtm_cur_ifidx = ifindex;
} else {
open_json_object(NULL);
print_string(PRINT_FP, NULL, "%-" __stringify(IFNAMSIZ) "s ", "");
}
print_range("vlan", vid, vrange);
print_nl();
print_string(PRINT_FP, NULL, "%-" __stringify(IFNAMSIZ) "s ", "");
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_SNOOPING]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_SNOOPING];
print_uint(PRINT_ANY, "mcast_snooping", "mcast_snooping %u ",
rta_getattr_u8(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERIER]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERIER];
print_uint(PRINT_ANY, "mcast_querier", "mcast_querier %u ",
rta_getattr_u8(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION];
print_uint(PRINT_ANY, "mcast_igmp_version",
"mcast_igmp_version %u ", rta_getattr_u8(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION];
print_uint(PRINT_ANY, "mcast_mld_version",
"mcast_mld_version %u ", rta_getattr_u8(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_CNT];
print_uint(PRINT_ANY, "mcast_last_member_count",
"mcast_last_member_count %u ",
rta_getattr_u32(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_INTVL]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_INTVL];
print_lluint(PRINT_ANY, "mcast_last_member_interval",
"mcast_last_member_interval %llu ",
rta_getattr_u64(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT];
print_uint(PRINT_ANY, "mcast_startup_query_count",
"mcast_startup_query_count %u ",
rta_getattr_u32(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_INTVL]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_INTVL];
print_lluint(PRINT_ANY, "mcast_startup_query_interval",
"mcast_startup_query_interval %llu ",
rta_getattr_u64(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_MEMBERSHIP_INTVL]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_MEMBERSHIP_INTVL];
print_lluint(PRINT_ANY, "mcast_membership_interval",
"mcast_membership_interval %llu ",
rta_getattr_u64(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_INTVL]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_INTVL];
print_lluint(PRINT_ANY, "mcast_querier_interval",
"mcast_querier_interval %llu ",
rta_getattr_u64(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERY_INTVL]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERY_INTVL];
print_lluint(PRINT_ANY, "mcast_query_interval",
"mcast_query_interval %llu ",
rta_getattr_u64(vattr));
}
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERY_RESPONSE_INTVL]) {
vattr = vtb[BRIDGE_VLANDB_GOPTS_MCAST_QUERY_RESPONSE_INTVL];
print_lluint(PRINT_ANY, "mcast_query_response_interval",
"mcast_query_response_interval %llu ",
rta_getattr_u64(vattr));
}
print_nl();
if (vtb[BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS]) {
vattr = RTA_DATA(vtb[BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS]);
print_vlan_router_ports(vattr);
}
close_json_object();
}
static void print_vlan_opts(struct rtattr *a, int ifindex)
{
struct rtattr *vtb[BRIDGE_VLANDB_ENTRY_MAX + 1], *vattr;
struct bridge_vlan_xstats vstats;
struct bridge_vlan_info *vinfo;
__u16 vrange = 0;
__u8 state = 0;
if ((a->rta_type & NLA_TYPE_MASK) != BRIDGE_VLANDB_ENTRY)
return;
parse_rtattr_flags(vtb, BRIDGE_VLANDB_ENTRY_MAX, RTA_DATA(a),
RTA_PAYLOAD(a), NLA_F_NESTED);
vinfo = RTA_DATA(vtb[BRIDGE_VLANDB_ENTRY_INFO]);
memset(&vstats, 0, sizeof(vstats));
if (vtb[BRIDGE_VLANDB_ENTRY_RANGE])
vrange = rta_getattr_u16(vtb[BRIDGE_VLANDB_ENTRY_RANGE]);
else
vrange = vinfo->vid;
if (filter_vlan && (filter_vlan < vinfo->vid || filter_vlan > vrange))
return;
if (vtb[BRIDGE_VLANDB_ENTRY_STATE])
state = rta_getattr_u8(vtb[BRIDGE_VLANDB_ENTRY_STATE]);
if (vtb[BRIDGE_VLANDB_ENTRY_STATS]) {
struct rtattr *stb[BRIDGE_VLANDB_STATS_MAX+1];
struct rtattr *attr;
attr = vtb[BRIDGE_VLANDB_ENTRY_STATS];
parse_rtattr(stb, BRIDGE_VLANDB_STATS_MAX, RTA_DATA(attr),
RTA_PAYLOAD(attr));
if (stb[BRIDGE_VLANDB_STATS_RX_BYTES]) {
attr = stb[BRIDGE_VLANDB_STATS_RX_BYTES];
vstats.rx_bytes = rta_getattr_u64(attr);
}
if (stb[BRIDGE_VLANDB_STATS_RX_PACKETS]) {
attr = stb[BRIDGE_VLANDB_STATS_RX_PACKETS];
vstats.rx_packets = rta_getattr_u64(attr);
}
if (stb[BRIDGE_VLANDB_STATS_TX_PACKETS]) {
attr = stb[BRIDGE_VLANDB_STATS_TX_PACKETS];
vstats.tx_packets = rta_getattr_u64(attr);
}
if (stb[BRIDGE_VLANDB_STATS_TX_BYTES]) {
attr = stb[BRIDGE_VLANDB_STATS_TX_BYTES];
vstats.tx_bytes = rta_getattr_u64(attr);
}
}
if (vlan_rtm_cur_ifidx != ifindex) {
open_vlan_port(ifindex, VLAN_SHOW_VLAN);
open_json_object(NULL);
vlan_rtm_cur_ifidx = ifindex;
} else {
open_json_object(NULL);
print_string(PRINT_FP, NULL, "%-" __stringify(IFNAMSIZ) "s ", "");
}
print_range("vlan", vinfo->vid, vrange);
print_vlan_flags(vinfo->flags);
print_nl();
print_string(PRINT_FP, NULL, "%-" __stringify(IFNAMSIZ) "s ", "");
print_stp_state(state);
if (vtb[BRIDGE_VLANDB_ENTRY_MCAST_ROUTER]) {
vattr = vtb[BRIDGE_VLANDB_ENTRY_MCAST_ROUTER];
print_uint(PRINT_ANY, "mcast_router", "mcast_router %u ",
rta_getattr_u8(vattr));
}
print_nl();
if (show_stats)
__print_one_vlan_stats(&vstats);
close_json_object();
}
int print_vlan_rtm(struct nlmsghdr *n, void *arg, bool monitor, bool global_only)
{
struct br_vlan_msg *bvm = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *a;
int rem;
if (n->nlmsg_type != RTM_NEWVLAN && n->nlmsg_type != RTM_DELVLAN &&
n->nlmsg_type != RTM_GETVLAN) {
fprintf(stderr, "Unknown vlan rtm message: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
return 0;
}
len -= NLMSG_LENGTH(sizeof(*bvm));
if (len < 0) {
fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
return -1;
}
if (bvm->family != AF_BRIDGE)
return 0;
if (filter_index && filter_index != bvm->ifindex)
return 0;
if (n->nlmsg_type == RTM_DELVLAN)
print_bool(PRINT_ANY, "deleted", "Deleted ", true);
if (monitor)
vlan_rtm_cur_ifidx = -1;
if (vlan_rtm_cur_ifidx != -1 && vlan_rtm_cur_ifidx != bvm->ifindex) {
close_vlan_port();
vlan_rtm_cur_ifidx = -1;
}
rem = len;
for (a = BRVLAN_RTA(bvm); RTA_OK(a, rem); a = RTA_NEXT(a, rem)) {
unsigned short rta_type = a->rta_type & NLA_TYPE_MASK;
/* skip unknown attributes */
if (rta_type > BRIDGE_VLANDB_MAX ||
(global_only && rta_type != BRIDGE_VLANDB_GLOBAL_OPTIONS))
continue;
switch (rta_type) {
case BRIDGE_VLANDB_ENTRY:
print_vlan_opts(a, bvm->ifindex);
break;
case BRIDGE_VLANDB_GLOBAL_OPTIONS:
print_vlan_global_opts(a, bvm->ifindex);
break;
}
}
return 0;
}
static int print_vlan_rtm_filter(struct nlmsghdr *n, void *arg)
{
return print_vlan_rtm(n, arg, false, false);
}
static int print_vlan_rtm_global_filter(struct nlmsghdr *n, void *arg)
{
return print_vlan_rtm(n, arg, false, true);
}
static int vlan_show(int argc, char **argv, int subject) static int vlan_show(int argc, char **argv, int subject)
{ {
char *filter_dev = NULL; char *filter_dev = NULL;
@ -549,6 +1124,34 @@ static int vlan_show(int argc, char **argv, int subject)
new_json_obj(json); new_json_obj(json);
/* if show_details is true then use the new bridge vlan dump format */
if (show_details && subject == VLAN_SHOW_VLAN) {
__u32 dump_flags = show_stats ? BRIDGE_VLANDB_DUMPF_STATS : 0;
if (rtnl_brvlandump_req(&rth, PF_BRIDGE, dump_flags) < 0) {
perror("Cannot send dump request");
exit(1);
}
if (!is_json_context()) {
printf("%-" __stringify(IFNAMSIZ) "s %-"
__stringify(VLAN_ID_LEN) "s", "port",
"vlan-id");
printf("\n");
}
ret = rtnl_dump_filter(&rth, print_vlan_rtm_filter, &subject);
if (ret < 0) {
fprintf(stderr, "Dump terminated\n");
exit(1);
}
if (vlan_rtm_cur_ifidx != -1)
close_vlan_port();
goto out;
}
if (!show_stats) { if (!show_stats) {
if (rtnl_linkdump_req_filter(&rth, PF_BRIDGE, if (rtnl_linkdump_req_filter(&rth, PF_BRIDGE,
(compress_vlans ? (compress_vlans ?
@ -602,6 +1205,62 @@ static int vlan_show(int argc, char **argv, int subject)
} }
} }
out:
delete_json_obj();
fflush(stdout);
return 0;
}
static int vlan_global_show(int argc, char **argv)
{
__u32 dump_flags = BRIDGE_VLANDB_DUMPF_GLOBAL;
int ret = 0, subject = VLAN_SHOW_VLAN;
char *filter_dev = NULL;
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
if (filter_dev)
duparg("dev", *argv);
filter_dev = *argv;
} else if (strcmp(*argv, "vid") == 0) {
NEXT_ARG();
if (filter_vlan)
duparg("vid", *argv);
filter_vlan = atoi(*argv);
}
argc--; argv++;
}
if (filter_dev) {
filter_index = ll_name_to_index(filter_dev);
if (!filter_index)
return nodev(filter_dev);
}
new_json_obj(json);
if (rtnl_brvlandump_req(&rth, PF_BRIDGE, dump_flags) < 0) {
perror("Cannot send dump request");
exit(1);
}
if (!is_json_context()) {
printf("%-" __stringify(IFNAMSIZ) "s %-"
__stringify(VLAN_ID_LEN) "s", "port",
"vlan-id");
printf("\n");
}
ret = rtnl_dump_filter(&rth, print_vlan_rtm_global_filter, &subject);
if (ret < 0) {
fprintf(stderr, "Dump terminated\n");
exit(1);
}
if (vlan_rtm_cur_ifidx != -1)
close_vlan_port();
delete_json_obj(); delete_json_obj();
fflush(stdout); fflush(stdout);
return 0; return 0;
@ -651,6 +1310,24 @@ void print_vlan_info(struct rtattr *tb, int ifindex)
close_vlan_port(); close_vlan_port();
} }
static int vlan_global(int argc, char **argv)
{
if (argc > 0) {
if (strcmp(*argv, "show") == 0 ||
strcmp(*argv, "lst") == 0 ||
strcmp(*argv, "list") == 0)
return vlan_global_show(argc-1, argv+1);
else if (strcmp(*argv, "set") == 0)
return vlan_global_option_set(argc-1, argv+1);
else
usage();
} else {
return vlan_global_show(0, NULL);
}
return 0;
}
int do_vlan(int argc, char **argv) int do_vlan(int argc, char **argv)
{ {
ll_init_map(&rth); ll_init_map(&rth);
@ -667,6 +1344,10 @@ int do_vlan(int argc, char **argv)
if (matches(*argv, "tunnelshow") == 0) { if (matches(*argv, "tunnelshow") == 0) {
return vlan_show(argc-1, argv+1, VLAN_SHOW_TUNNELINFO); return vlan_show(argc-1, argv+1, VLAN_SHOW_TUNNELINFO);
} }
if (matches(*argv, "set") == 0)
return vlan_option_set(argc-1, argv+1);
if (strcmp(*argv, "global") == 0)
return vlan_global(argc-1, argv+1);
if (matches(*argv, "help") == 0) if (matches(*argv, "help") == 0)
usage(); usage();
} else { } else {

224
configure vendored
View File

@ -1,8 +1,10 @@
#!/bin/sh #!/bin/sh
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
# This is not an autoconf generated configure # This is not an autoconf generated configure
#
INCLUDE=${1:-"$PWD/include"} INCLUDE="$PWD/include"
PREFIX="/usr"
LIBDIR="\${prefix}/lib"
# Output file which is input to Makefile # Output file which is input to Makefile
CONFIG=config.mk CONFIG=config.mk
@ -148,6 +150,15 @@ EOF
rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest rm -f $TMPDIR/ipttest.c $TMPDIR/ipttest
} }
check_lib_dir()
{
LIBDIR=$(echo $LIBDIR | sed "s|\${prefix}|$PREFIX|")
echo -n "lib directory: "
echo "$LIBDIR"
echo "LIBDIR:=$LIBDIR" >> $CONFIG
}
check_ipt() check_ipt()
{ {
if ! grep TC_CONFIG_XT $CONFIG > /dev/null; then if ! grep TC_CONFIG_XT $CONFIG > /dev/null; then
@ -197,6 +208,31 @@ EOF
rm -f $TMPDIR/setnstest.c $TMPDIR/setnstest rm -f $TMPDIR/setnstest.c $TMPDIR/setnstest
} }
check_name_to_handle_at()
{
cat >$TMPDIR/name_to_handle_at_test.c <<EOF
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
struct file_handle *fhp;
int mount_id, flags, dirfd;
char *pathname;
name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
return 0;
}
EOF
if $CC -I$INCLUDE -o $TMPDIR/name_to_handle_at_test $TMPDIR/name_to_handle_at_test.c >/dev/null 2>&1; then
echo "yes"
echo "CFLAGS += -DHAVE_HANDLE_AT" >>$CONFIG
else
echo "no"
fi
rm -f $TMPDIR/name_to_handle_at_test.c $TMPDIR/name_to_handle_at_test
}
check_ipset() check_ipset()
{ {
cat >$TMPDIR/ipsettest.c <<EOF cat >$TMPDIR/ipsettest.c <<EOF
@ -208,7 +244,7 @@ typedef unsigned short ip_set_id_t;
#include <linux/netfilter/xt_set.h> #include <linux/netfilter/xt_set.h>
struct xt_set_info info; struct xt_set_info info;
#if IPSET_PROTOCOL == 6 #if IPSET_PROTOCOL == 6 || IPSET_PROTOCOL == 7
int main(void) int main(void)
{ {
return IPSET_MAXNAMELEN; return IPSET_MAXNAMELEN;
@ -240,6 +276,111 @@ check_elf()
fi fi
} }
have_libbpf_basic()
{
cat >$TMPDIR/libbpf_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
bpf_program__set_autoload(NULL, false);
bpf_map__ifindex(NULL);
bpf_map__set_pin_path(NULL, NULL);
bpf_object__open_file(NULL, NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_test $TMPDIR/libbpf_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_test.c $TMPDIR/libbpf_test
return $ret
}
have_libbpf_sec_name()
{
cat >$TMPDIR/libbpf_sec_test.c <<EOF
#include <bpf/libbpf.h>
int main(int argc, char **argv) {
void *ptr;
bpf_program__section_name(NULL);
return 0;
}
EOF
$CC -o $TMPDIR/libbpf_sec_test $TMPDIR/libbpf_sec_test.c $LIBBPF_CFLAGS $LIBBPF_LDLIBS >/dev/null 2>&1
local ret=$?
rm -f $TMPDIR/libbpf_sec_test.c $TMPDIR/libbpf_sec_test
return $ret
}
check_force_libbpf_on()
{
# if set LIBBPF_FORCE=on but no libbpf support, just exist the config
# process to make sure we don't build without libbpf.
if [ "$LIBBPF_FORCE" = on ]; then
echo " LIBBPF_FORCE=on set, but couldn't find a usable libbpf"
exit 1
fi
}
check_libbpf()
{
# if set LIBBPF_FORCE=off, disable libbpf entirely
if [ "$LIBBPF_FORCE" = off ]; then
echo "no"
return
fi
if ! ${PKG_CONFIG} libbpf --exists && [ -z "$LIBBPF_DIR" ] ; then
echo "no"
check_force_libbpf_on
return
fi
if [ $(uname -m) = x86_64 ]; then
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib64"
else
local LIBBPF_LIBDIR="${LIBBPF_DIR}/usr/lib"
fi
if [ -n "$LIBBPF_DIR" ]; then
LIBBPF_CFLAGS="-I${LIBBPF_DIR}/usr/include"
LIBBPF_LDLIBS="${LIBBPF_LIBDIR}/libbpf.a -lz -lelf"
LIBBPF_VERSION=$(PKG_CONFIG_LIBDIR=${LIBBPF_LIBDIR}/pkgconfig ${PKG_CONFIG} libbpf --modversion)
else
LIBBPF_CFLAGS=$(${PKG_CONFIG} libbpf --cflags)
LIBBPF_LDLIBS=$(${PKG_CONFIG} libbpf --libs)
LIBBPF_VERSION=$(${PKG_CONFIG} libbpf --modversion)
fi
if ! have_libbpf_basic; then
echo "no"
echo " libbpf version $LIBBPF_VERSION is too low, please update it to at least 0.1.0"
check_force_libbpf_on
return
else
echo "HAVE_LIBBPF:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF ' $LIBBPF_CFLAGS >> $CONFIG
echo "CFLAGS += -DLIBBPF_VERSION=\\\"$LIBBPF_VERSION\\\"" >> $CONFIG
echo 'LDLIBS += ' $LIBBPF_LDLIBS >> $CONFIG
if [ -z "$LIBBPF_DIR" ]; then
echo "CFLAGS += -DLIBBPF_DYNAMIC" >> $CONFIG
fi
fi
# bpf_program__title() is deprecated since libbpf 0.2.0, use
# bpf_program__section_name() instead if we support
if have_libbpf_sec_name; then
echo "HAVE_LIBBPF_SECTION_NAME:=y" >> $CONFIG
echo 'CFLAGS += -DHAVE_LIBBPF_SECTION_NAME ' >> $CONFIG
fi
echo "yes"
echo " libbpf version $LIBBPF_VERSION"
}
check_selinux() check_selinux()
# SELinux is a compile time option in the ss utility # SELinux is a compile time option in the ss utility
{ {
@ -351,6 +492,76 @@ endif
EOF EOF
} }
usage()
{
cat <<EOF
Usage: $0 [OPTIONS]
--include_dir <dir> Path to iproute2 include dir
--libdir <dir> Path to iproute2 lib dir
--libbpf_dir <dir> Path to libbpf DESTDIR
--libbpf_force <on|off> Enable/disable libbpf by force. Available options:
on: require link against libbpf, quit config if no libbpf support
off: disable libbpf probing
--prefix <dir> Path prefix of the lib files to install
-h | --help Show this usage info
EOF
exit $1
}
# Compat with the old INCLUDE path setting method.
if [ $# -eq 1 ] && [ "$(echo $1 | cut -c 1)" != '-' ]; then
INCLUDE="$1"
else
while [ "$#" -gt 0 ]; do
case "$1" in
--include_dir)
shift
INCLUDE="$1" ;;
--include_dir=*)
INCLUDE="${1#*=}" ;;
--libdir)
shift
LIBDIR="$1" ;;
--libdir=*)
LIBDIR="${1#*=}" ;;
--libbpf_dir)
shift
LIBBPF_DIR="$1" ;;
--libbpf_dir=*)
LIBBPF_DIR="${1#*=}" ;;
--libbpf_force)
shift
LIBBPF_FORCE="$1" ;;
--libbpf_force=*)
LIBBPF_FORCE="${1#*=}" ;;
--prefix)
shift
PREFIX="$1" ;;
--prefix=*)
PREFIX="${1#*=}" ;;
-h | --help)
usage 0 ;;
--*)
;;
*)
usage 1 ;;
esac
[ "$#" -gt 0 ] && shift
done
fi
[ -d "$INCLUDE" ] || usage 1
if [ "${LIBBPF_DIR-unused}" != "unused" ]; then
[ -d "$LIBBPF_DIR" ] || usage 1
fi
if [ "${LIBBPF_FORCE-unused}" != "unused" ]; then
if [ "$LIBBPF_FORCE" != 'on' ] && [ "$LIBBPF_FORCE" != 'off' ]; then
usage 1
fi
fi
[ -z "$PREFIX" ] && usage 1
[ -z "$LIBDIR" ] && usage 1
echo "# Generated config based on" $INCLUDE >$CONFIG echo "# Generated config based on" $INCLUDE >$CONFIG
quiet_config >> $CONFIG quiet_config >> $CONFIG
@ -374,6 +585,7 @@ if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
fi fi
echo echo
check_lib_dir
if ! grep -q TC_CONFIG_NO_XT $CONFIG; then if ! grep -q TC_CONFIG_NO_XT $CONFIG; then
echo -n "iptables modules directory: " echo -n "iptables modules directory: "
check_ipt_lib_dir check_ipt_lib_dir
@ -382,9 +594,15 @@ fi
echo -n "libc has setns: " echo -n "libc has setns: "
check_setns check_setns
echo -n "libc has name_to_handle_at: "
check_name_to_handle_at
echo -n "SELinux support: " echo -n "SELinux support: "
check_selinux check_selinux
echo -n "libbpf support: "
check_libbpf
echo -n "ELF support: " echo -n "ELF support: "
check_elf check_elf

1
dcb/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
dcb

31
dcb/Makefile Normal file
View File

@ -0,0 +1,31 @@
# SPDX-License-Identifier: GPL-2.0
include ../config.mk
TARGETS :=
ifeq ($(HAVE_MNL),y)
DCBOBJ = dcb.o \
dcb_app.o \
dcb_buffer.o \
dcb_dcbx.o \
dcb_ets.o \
dcb_maxrate.o \
dcb_pfc.o
TARGETS += dcb
LDLIBS += -lm
endif
all: $(TARGETS) $(LIBS)
dcb: $(DCBOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all
for i in $(TARGETS); \
do install -m 0755 $$i $(DESTDIR)$(SBINDIR); \
done
clean:
rm -f $(DCBOBJ) $(TARGETS)

611
dcb/dcb.c Normal file
View File

@ -0,0 +1,611 @@
// SPDX-License-Identifier: GPL-2.0+
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include <libmnl/libmnl.h>
#include <getopt.h>
#include "dcb.h"
#include "mnl_utils.h"
#include "namespace.h"
#include "utils.h"
#include "version.h"
static int dcb_init(struct dcb *dcb)
{
dcb->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
if (dcb->buf == NULL) {
perror("Netlink buffer allocation");
return -1;
}
dcb->nl = mnlu_socket_open(NETLINK_ROUTE);
if (dcb->nl == NULL) {
perror("Open netlink socket");
goto err_socket_open;
}
new_json_obj_plain(dcb->json_output);
return 0;
err_socket_open:
free(dcb->buf);
return -1;
}
static void dcb_fini(struct dcb *dcb)
{
delete_json_obj_plain();
mnl_socket_close(dcb->nl);
free(dcb->buf);
}
static struct dcb *dcb_alloc(void)
{
struct dcb *dcb;
dcb = calloc(1, sizeof(*dcb));
if (!dcb)
return NULL;
return dcb;
}
static void dcb_free(struct dcb *dcb)
{
free(dcb);
}
struct dcb_get_attribute {
struct dcb *dcb;
int attr;
void *payload;
__u16 payload_len;
};
static int dcb_get_attribute_attr_ieee_cb(const struct nlattr *attr, void *data)
{
struct dcb_get_attribute *ga = data;
if (mnl_attr_get_type(attr) != ga->attr)
return MNL_CB_OK;
ga->payload = mnl_attr_get_payload(attr);
ga->payload_len = mnl_attr_get_payload_len(attr);
return MNL_CB_STOP;
}
static int dcb_get_attribute_attr_cb(const struct nlattr *attr, void *data)
{
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE)
return MNL_CB_OK;
return mnl_attr_parse_nested(attr, dcb_get_attribute_attr_ieee_cb, data);
}
static int dcb_get_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_get_attribute_attr_cb, data);
}
static int dcb_get_attribute_bare_cb(const struct nlmsghdr *nlh, void *data)
{
/* Bare attributes (e.g. DCB_ATTR_DCBX) are not wrapped inside an IEEE
* container, so this does not have to go through unpacking in
* dcb_get_attribute_attr_cb().
*/
return mnl_attr_parse(nlh, sizeof(struct dcbmsg),
dcb_get_attribute_attr_ieee_cb, data);
}
struct dcb_set_attribute_response {
int response_attr;
};
static int dcb_set_attribute_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_set_attribute_response *resp = data;
uint16_t len;
uint8_t err;
if (mnl_attr_get_type(attr) != resp->response_attr)
return MNL_CB_OK;
len = mnl_attr_get_payload_len(attr);
if (len != 1) {
fprintf(stderr, "Response attribute expected to have size 1, not %d\n", len);
return MNL_CB_ERROR;
}
err = mnl_attr_get_u8(attr);
if (err) {
fprintf(stderr, "Error when attempting to set attribute: %s\n",
strerror(err));
return MNL_CB_ERROR;
}
return MNL_CB_STOP;
}
static int dcb_set_attribute_cb(const struct nlmsghdr *nlh, void *data)
{
return mnl_attr_parse(nlh, sizeof(struct dcbmsg), dcb_set_attribute_attr_cb, data);
}
static int dcb_talk(struct dcb *dcb, struct nlmsghdr *nlh, mnl_cb_t cb, void *data)
{
int ret;
ret = mnl_socket_sendto(dcb->nl, nlh, nlh->nlmsg_len);
if (ret < 0) {
perror("mnl_socket_sendto");
return -1;
}
return mnlu_socket_recv_run(dcb->nl, nlh->nlmsg_seq, dcb->buf, MNL_SOCKET_BUFFER_SIZE,
cb, data);
}
static struct nlmsghdr *dcb_prepare(struct dcb *dcb, const char *dev,
uint32_t nlmsg_type, uint8_t dcb_cmd)
{
struct dcbmsg dcbm = {
.cmd = dcb_cmd,
};
struct nlmsghdr *nlh;
nlh = mnlu_msg_prepare(dcb->buf, nlmsg_type, NLM_F_REQUEST, &dcbm, sizeof(dcbm));
mnl_attr_put_strz(nlh, DCB_ATTR_IFNAME, dev);
return nlh;
}
static int __dcb_get_attribute(struct dcb *dcb, int command,
const char *dev, int attr,
void **payload_p, __u16 *payload_len_p,
int (*get_attribute_cb)(const struct nlmsghdr *nlh,
void *data))
{
struct dcb_get_attribute ga;
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_GETDCB, command);
ga = (struct dcb_get_attribute) {
.dcb = dcb,
.attr = attr,
.payload = NULL,
};
ret = dcb_talk(dcb, nlh, get_attribute_cb, &ga);
if (ret) {
perror("Attribute read");
return ret;
}
if (ga.payload == NULL) {
perror("Attribute not found");
return -ENOENT;
}
*payload_p = ga.payload;
*payload_len_p = ga.payload_len;
return 0;
}
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, DCB_CMD_IEEE_GET, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_cb);
}
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p)
{
return __dcb_get_attribute(dcb, cmd, dev, attr,
payload_p, payload_len_p,
dcb_get_attribute_bare_cb);
}
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr, void *data, size_t data_len)
{
__u16 payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, attr, &payload, &payload_len);
if (ret)
return ret;
if (payload_len != data_len) {
fprintf(stderr, "Wrong len %d, expected %zd\n", payload_len, data_len);
return -EINVAL;
}
memcpy(data, payload, data_len);
return 0;
}
static int __dcb_set_attribute(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *, struct nlmsghdr *, void *),
void *data, int response_attr)
{
struct dcb_set_attribute_response resp = {
.response_attr = response_attr,
};
struct nlmsghdr *nlh;
int ret;
nlh = dcb_prepare(dcb, dev, RTM_SETDCB, command);
ret = cb(dcb, nlh, data);
if (ret)
return ret;
ret = dcb_talk(dcb, nlh, dcb_set_attribute_cb, &resp);
if (ret) {
perror("Attribute write");
return ret;
}
return 0;
}
struct dcb_set_attribute_ieee_cb {
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data);
void *data;
};
static int dcb_set_attribute_ieee_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute_ieee_cb *ieee_data = data;
struct nlattr *nest;
int ret;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE);
ret = ieee_data->cb(dcb, nlh, ieee_data->data);
if (ret)
return ret;
mnl_attr_nest_end(nlh, nest);
return 0;
}
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data)
{
struct dcb_set_attribute_ieee_cb ieee_data = {
.cb = cb,
.data = data,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_ieee_cb, &ieee_data,
DCB_ATTR_IEEE);
}
struct dcb_set_attribute {
int attr;
const void *data;
size_t data_len;
};
static int dcb_set_attribute_put(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_set_attribute *dsa = data;
mnl_attr_put(nlh, dsa->attr, dsa->data_len, dsa->data);
return 0;
}
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr, const void *data, size_t data_len)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return dcb_set_attribute_va(dcb, DCB_CMD_IEEE_SET, dev,
&dcb_set_attribute_put, &dsa);
}
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr)
{
struct dcb_set_attribute dsa = {
.attr = attr,
.data = data,
.data_len = data_len,
};
return __dcb_set_attribute(dcb, command, dev,
&dcb_set_attribute_put, &dsa, response_attr);
}
void dcb_print_array_u8(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%d ", i);
print_uint(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_u64(const __u64 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%" PRIu64 " ", i);
print_u64(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_on_off(const __u8 *array, size_t size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_on_off(PRINT_ANY, NULL, b, array[i]);
}
}
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size)
{
SPRINT_BUF(b);
size_t i;
for (i = 0; i < array_size; i++) {
__u8 emt = array[i];
snprintf(b, sizeof(b), "%zd:%%s ", i);
if (emt < kw_size && kw[emt])
print_string(PRINT_ANY, NULL, b, kw[emt]);
else
print_string(PRINT_ANY, NULL, b, "???");
}
}
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t))
{
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
print_array(array, size);
close_json_array(PRINT_JSON, json_name);
}
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data)
{
bool is_all = key == (__u32) -1;
if (!is_all && key > max_key) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%d\n",
what_key, what_value, what_key, max_key);
return -EINVAL;
}
if (value > max_value) {
fprintf(stderr, "In %s:%s mapping, %s is expected to be 0..%llu\n",
what_key, what_value, what_value, max_value);
return -EINVAL;
}
if (is_all) {
for (key = 0; key <= max_key; key++)
set_array(key, value, set_array_data);
} else {
set_array(key, value, set_array_data);
}
return 0;
}
void dcb_set_u8(__u32 key, __u64 value, void *data)
{
__u8 *array = data;
array[key] = value;
}
void dcb_set_u32(__u32 key, __u64 value, void *data)
{
__u32 *array = data;
array[key] = value;
}
void dcb_set_u64(__u32 key, __u64 value, void *data)
{
__u64 *array = data;
array[key] = value;
}
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void))
{
const char *dev;
if (!argc || matches(*argv, "help") == 0) {
help();
return 0;
} else if (matches(*argv, "dev") == 0) {
NEXT_ARG();
dev = *argv;
if (check_ifname(dev)) {
invarg("not a valid ifname", *argv);
return -EINVAL;
}
NEXT_ARG_FWD();
return and_then(dcb, dev, argc, argv);
} else {
fprintf(stderr, "Expected `dev DEV', not `%s'", *argv);
help();
return -EINVAL;
}
}
static void dcb_help(void)
{
fprintf(stderr,
"Usage: dcb [ OPTIONS ] OBJECT { COMMAND | help }\n"
" dcb [ -f | --force ] { -b | --batch } filename [ -n | --netns ] netnsname\n"
"where OBJECT := { app | buffer | dcbx | ets | maxrate | pfc }\n"
" OPTIONS := [ -V | --Version | -i | --iec | -j | --json\n"
" | -N | --Numeric | -p | --pretty\n"
" | -s | --statistics | -v | --verbose]\n");
}
static int dcb_cmd(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_help();
return 0;
} else if (matches(*argv, "app") == 0) {
return dcb_cmd_app(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "buffer") == 0) {
return dcb_cmd_buffer(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "dcbx") == 0) {
return dcb_cmd_dcbx(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "ets") == 0) {
return dcb_cmd_ets(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "maxrate") == 0) {
return dcb_cmd_maxrate(dcb, argc - 1, argv + 1);
} else if (matches(*argv, "pfc") == 0) {
return dcb_cmd_pfc(dcb, argc - 1, argv + 1);
}
fprintf(stderr, "Object \"%s\" is unknown\n", *argv);
return -ENOENT;
}
static int dcb_batch_cmd(int argc, char *argv[], void *data)
{
struct dcb *dcb = data;
return dcb_cmd(dcb, argc, argv);
}
static int dcb_batch(struct dcb *dcb, const char *name, bool force)
{
return do_batch(name, force, dcb_batch_cmd, dcb);
}
int main(int argc, char **argv)
{
static const struct option long_options[] = {
{ "Version", no_argument, NULL, 'V' },
{ "force", no_argument, NULL, 'f' },
{ "batch", required_argument, NULL, 'b' },
{ "iec", no_argument, NULL, 'i' },
{ "json", no_argument, NULL, 'j' },
{ "Numeric", no_argument, NULL, 'N' },
{ "pretty", no_argument, NULL, 'p' },
{ "statistics", no_argument, NULL, 's' },
{ "netns", required_argument, NULL, 'n' },
{ "help", no_argument, NULL, 'h' },
{ NULL, 0, NULL, 0 }
};
const char *batch_file = NULL;
bool force = false;
struct dcb *dcb;
int opt;
int err;
int ret;
dcb = dcb_alloc();
if (!dcb) {
fprintf(stderr, "Failed to allocate memory for dcb\n");
return EXIT_FAILURE;
}
while ((opt = getopt_long(argc, argv, "b:fhijn:psvNV",
long_options, NULL)) >= 0) {
switch (opt) {
case 'V':
printf("dcb utility, iproute2-%s\n", version);
ret = EXIT_SUCCESS;
goto dcb_free;
case 'f':
force = true;
break;
case 'b':
batch_file = optarg;
break;
case 'j':
dcb->json_output = true;
break;
case 'N':
dcb->numeric = true;
break;
case 'p':
pretty = true;
break;
case 's':
dcb->stats = true;
break;
case 'n':
if (netns_switch(optarg)) {
ret = EXIT_FAILURE;
goto dcb_free;
}
break;
case 'i':
dcb->use_iec = true;
break;
case 'h':
dcb_help();
ret = EXIT_SUCCESS;
goto dcb_free;
default:
fprintf(stderr, "Unknown option.\n");
dcb_help();
ret = EXIT_FAILURE;
goto dcb_free;
}
}
argc -= optind;
argv += optind;
err = dcb_init(dcb);
if (err) {
ret = EXIT_FAILURE;
goto dcb_free;
}
if (batch_file)
err = dcb_batch(dcb, batch_file, force);
else
err = dcb_cmd(dcb, argc, argv);
if (err) {
ret = EXIT_FAILURE;
goto dcb_fini;
}
ret = EXIT_SUCCESS;
dcb_fini:
dcb_fini(dcb);
dcb_free:
dcb_free(dcb);
return ret;
}

81
dcb/dcb.h Normal file
View File

@ -0,0 +1,81 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __DCB_H__
#define __DCB_H__ 1
#include <libmnl/libmnl.h>
#include <stdbool.h>
#include <stddef.h>
/* dcb.c */
struct dcb {
char *buf;
struct mnl_socket *nl;
bool json_output;
bool stats;
bool use_iec;
bool numeric;
};
int dcb_parse_mapping(const char *what_key, __u32 key, __u32 max_key,
const char *what_value, __u64 value, __u64 max_value,
void (*set_array)(__u32 index, __u64 value, void *data),
void *set_array_data);
int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
int (*and_then)(struct dcb *dcb, const char *dev,
int argc, char **argv),
void (*help)(void));
void dcb_set_u8(__u32 key, __u64 value, void *data);
void dcb_set_u32(__u32 key, __u64 value, void *data);
void dcb_set_u64(__u32 key, __u64 value, void *data);
int dcb_get_attribute(struct dcb *dcb, const char *dev, int attr,
void *data, size_t data_len);
int dcb_set_attribute(struct dcb *dcb, const char *dev, int attr,
const void *data, size_t data_len);
int dcb_get_attribute_va(struct dcb *dcb, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_va(struct dcb *dcb, int command, const char *dev,
int (*cb)(struct dcb *dcb, struct nlmsghdr *nlh, void *data),
void *data);
int dcb_get_attribute_bare(struct dcb *dcb, int cmd, const char *dev, int attr,
void **payload_p, __u16 *payload_len_p);
int dcb_set_attribute_bare(struct dcb *dcb, int command, const char *dev,
int attr, const void *data, size_t data_len,
int response_attr);
void dcb_print_named_array(const char *json_name, const char *fp_name,
const __u8 *array, size_t size,
void (*print_array)(const __u8 *, size_t));
void dcb_print_array_u8(const __u8 *array, size_t size);
void dcb_print_array_u64(const __u64 *array, size_t size);
void dcb_print_array_on_off(const __u8 *array, size_t size);
void dcb_print_array_kw(const __u8 *array, size_t array_size,
const char *const kw[], size_t kw_size);
/* dcb_app.c */
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv);
/* dcb_buffer.c */
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv);
/* dcb_dcbx.c */
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv);
/* dcb_ets.c */
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv);
/* dcb_maxrate.c */
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv);
/* dcb_pfc.c */
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv);
#endif /* __DCB_H__ */

795
dcb/dcb_app.c Normal file
View File

@ -0,0 +1,795 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <libmnl/libmnl.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
#include "rt_names.h"
static void dcb_app_help_add(void)
{
fprintf(stderr,
"Usage: dcb app { add | del | replace } dev STRING\n"
" [ default-prio PRIO ]\n"
" [ ethtype-prio ET:PRIO ]\n"
" [ stream-port-prio PORT:PRIO ]\n"
" [ dgram-port-prio PORT:PRIO ]\n"
" [ port-prio PORT:PRIO ]\n"
" [ dscp-prio INTEGER:PRIO ]\n"
"\n"
" where PRIO := { 0 .. 7 }\n"
" ET := { 0x600 .. 0xffff }\n"
" PORT := { 1 .. 65535 }\n"
" DSCP := { 0 .. 63 }\n"
"\n"
);
}
static void dcb_app_help_show_flush(void)
{
fprintf(stderr,
"Usage: dcb app { show | flush } dev STRING\n"
" [ default-prio ]\n"
" [ ethtype-prio ]\n"
" [ stream-port-prio ]\n"
" [ dgram-port-prio ]\n"
" [ port-prio ]\n"
" [ dscp-prio ]\n"
"\n"
);
}
static void dcb_app_help(void)
{
fprintf(stderr,
"Usage: dcb app help\n"
"\n"
);
dcb_app_help_show_flush();
dcb_app_help_add();
}
struct dcb_app_table {
struct dcb_app *apps;
size_t n_apps;
};
static void dcb_app_table_fini(struct dcb_app_table *tab)
{
free(tab->apps);
}
static int dcb_app_table_push(struct dcb_app_table *tab, struct dcb_app *app)
{
struct dcb_app *apps = realloc(tab->apps, (tab->n_apps + 1) * sizeof(*tab->apps));
if (apps == NULL) {
perror("Cannot allocate APP table");
return -ENOMEM;
}
tab->apps = apps;
tab->apps[tab->n_apps++] = *app;
return 0;
}
static void dcb_app_table_remove_existing(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol &&
aa->priority == ab->priority) {
found = true;
break;
}
}
if (!found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static void dcb_app_table_remove_replaced(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t ia, ja;
size_t ib;
for (ia = 0, ja = 0; ia < a->n_apps; ia++) {
struct dcb_app *aa = &a->apps[ia];
bool present = false;
bool found = false;
for (ib = 0; ib < b->n_apps; ib++) {
const struct dcb_app *ab = &b->apps[ib];
if (aa->selector == ab->selector &&
aa->protocol == ab->protocol)
present = true;
else
continue;
if (aa->priority == ab->priority) {
found = true;
break;
}
}
/* Entries that remain in A will be removed, so keep in the
* table only APP entries whose sel/pid is mentioned in B,
* but that do not have the full sel/pid/prio match.
*/
if (present && !found)
a->apps[ja++] = *aa;
}
a->n_apps = ja;
}
static int dcb_app_table_copy(struct dcb_app_table *a,
const struct dcb_app_table *b)
{
size_t i;
int ret;
for (i = 0; i < b->n_apps; i++) {
ret = dcb_app_table_push(a, &b->apps[i]);
if (ret != 0)
return ret;
}
return 0;
}
static int dcb_app_cmp(const struct dcb_app *a, const struct dcb_app *b)
{
if (a->protocol < b->protocol)
return -1;
if (a->protocol > b->protocol)
return 1;
return a->priority - b->priority;
}
static int dcb_app_cmp_cb(const void *a, const void *b)
{
return dcb_app_cmp(a, b);
}
static void dcb_app_table_sort(struct dcb_app_table *tab)
{
qsort(tab->apps, tab->n_apps, sizeof(*tab->apps), dcb_app_cmp_cb);
}
struct dcb_app_parse_mapping {
__u8 selector;
struct dcb_app_table *tab;
int err;
};
static void dcb_app_parse_mapping_cb(__u32 key, __u64 value, void *data)
{
struct dcb_app_parse_mapping *pm = data;
struct dcb_app app = {
.selector = pm->selector,
.priority = value,
.protocol = key,
};
if (pm->err)
return;
pm->err = dcb_app_table_push(pm->tab, &app);
}
static int dcb_app_parse_mapping_ethtype_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key < 0x600) {
fprintf(stderr, "Protocol IDs < 0x600 are reserved for EtherType\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("ETHTYPE", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_dscp(__u32 *key, const char *arg)
{
if (parse_mapping_num_all(key, arg) == 0)
return 0;
if (rtnl_dsfield_a2n(key, arg) != 0)
return -1;
if (*key & 0x03) {
fprintf(stderr, "The values `%s' uses non-DSCP bits.\n", arg);
return -1;
}
/* Unshift the value to convert it from dsfield to DSCP. */
*key >>= 2;
return 0;
}
static int dcb_app_parse_mapping_dscp_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("DSCP", key, 63,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_mapping_port_prio(__u32 key, char *value, void *data)
{
__u8 prio;
if (key == 0) {
fprintf(stderr, "Port ID of 0 is invalid\n");
return -EINVAL;
}
if (get_u8(&prio, value, 0))
return -EINVAL;
return dcb_parse_mapping("PORT", key, 0xffff,
"PRIO", prio, IEEE_8021QAZ_MAX_TCS - 1,
dcb_app_parse_mapping_cb, data);
}
static int dcb_app_parse_default_prio(int *argcp, char ***argvp, struct dcb_app_table *tab)
{
int argc = *argcp;
char **argv = *argvp;
int ret = 0;
while (argc > 0) {
struct dcb_app app;
__u8 prio;
if (get_u8(&prio, *argv, 0)) {
ret = 1;
break;
}
app = (struct dcb_app){
.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE,
.protocol = 0,
.priority = prio,
};
ret = dcb_app_table_push(tab, &app);
if (ret != 0)
break;
argc--, argv++;
}
*argcp = argc;
*argvp = argv;
return ret;
}
static bool dcb_app_is_ethtype(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol != 0;
}
static bool dcb_app_is_default(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
app->protocol == 0;
}
static bool dcb_app_is_dscp(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DSCP;
}
static bool dcb_app_is_stream_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_STREAM;
}
static bool dcb_app_is_dgram_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_DGRAM;
}
static bool dcb_app_is_port(const struct dcb_app *app)
{
return app->selector == IEEE_8021QAZ_APP_SEL_ANY;
}
static int dcb_app_print_key_dec(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static int dcb_app_print_key_hex(__u16 protocol)
{
return print_uint(PRINT_ANY, NULL, "%x:", protocol);
}
static int dcb_app_print_key_dscp(__u16 protocol)
{
const char *name = rtnl_dsfield_get_name(protocol << 2);
if (!is_json_context() && name != NULL)
return print_string(PRINT_FP, NULL, "%s:", name);
return print_uint(PRINT_ANY, NULL, "%d:", protocol);
}
static void dcb_app_print_filtered(const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *),
int (*print_key)(__u16 protocol),
const char *json_name,
const char *fp_name)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
struct dcb_app *app = &tab->apps[i];
if (!filter(app))
continue;
if (first) {
open_json_array(PRINT_JSON, json_name);
print_string(PRINT_FP, NULL, "%s ", fp_name);
first = false;
}
open_json_array(PRINT_JSON, NULL);
print_key(app->protocol);
print_uint(PRINT_ANY, NULL, "%d ", app->priority);
close_json_array(PRINT_JSON, NULL);
}
if (!first) {
close_json_array(PRINT_JSON, json_name);
print_nl();
}
}
static void dcb_app_print_ethtype_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_ethtype, dcb_app_print_key_hex,
"ethtype_prio", "ethtype-prio");
}
static void dcb_app_print_dscp_prio(const struct dcb *dcb,
const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dscp,
dcb->numeric ? dcb_app_print_key_dec
: dcb_app_print_key_dscp,
"dscp_prio", "dscp-prio");
}
static void dcb_app_print_stream_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_stream_port, dcb_app_print_key_dec,
"stream_port_prio", "stream-port-prio");
}
static void dcb_app_print_dgram_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_dgram_port, dcb_app_print_key_dec,
"dgram_port_prio", "dgram-port-prio");
}
static void dcb_app_print_port_prio(const struct dcb_app_table *tab)
{
dcb_app_print_filtered(tab, dcb_app_is_port, dcb_app_print_key_dec,
"port_prio", "port-prio");
}
static void dcb_app_print_default_prio(const struct dcb_app_table *tab)
{
bool first = true;
size_t i;
for (i = 0; i < tab->n_apps; i++) {
if (!dcb_app_is_default(&tab->apps[i]))
continue;
if (first) {
open_json_array(PRINT_JSON, "default_prio");
print_string(PRINT_FP, NULL, "default-prio ", NULL);
first = false;
}
print_uint(PRINT_ANY, NULL, "%d ", tab->apps[i].priority);
}
if (!first) {
close_json_array(PRINT_JSON, "default_prio");
print_nl();
}
}
static void dcb_app_print(const struct dcb *dcb, const struct dcb_app_table *tab)
{
dcb_app_print_ethtype_prio(tab);
dcb_app_print_default_prio(tab);
dcb_app_print_dscp_prio(dcb, tab);
dcb_app_print_stream_port_prio(tab);
dcb_app_print_dgram_port_prio(tab);
dcb_app_print_port_prio(tab);
}
static int dcb_app_get_table_attr_cb(const struct nlattr *attr, void *data)
{
struct dcb_app_table *tab = data;
struct dcb_app *app;
int ret;
if (mnl_attr_get_type(attr) != DCB_ATTR_IEEE_APP) {
fprintf(stderr, "Unknown attribute in DCB_ATTR_IEEE_APP_TABLE: %d\n",
mnl_attr_get_type(attr));
return MNL_CB_OK;
}
if (mnl_attr_get_payload_len(attr) < sizeof(struct dcb_app)) {
fprintf(stderr, "DCB_ATTR_IEEE_APP payload expected to have size %zd, not %d\n",
sizeof(struct dcb_app), mnl_attr_get_payload_len(attr));
return MNL_CB_OK;
}
app = mnl_attr_get_payload(attr);
ret = dcb_app_table_push(tab, app);
if (ret != 0)
return MNL_CB_ERROR;
return MNL_CB_OK;
}
static int dcb_app_get(struct dcb *dcb, const char *dev, struct dcb_app_table *tab)
{
uint16_t payload_len;
void *payload;
int ret;
ret = dcb_get_attribute_va(dcb, dev, DCB_ATTR_IEEE_APP_TABLE, &payload, &payload_len);
if (ret != 0)
return ret;
ret = mnl_attr_parse_payload(payload, payload_len, dcb_app_get_table_attr_cb, tab);
if (ret != MNL_CB_OK)
return -EINVAL;
return 0;
}
struct dcb_app_add_del {
const struct dcb_app_table *tab;
bool (*filter)(const struct dcb_app *app);
};
static int dcb_app_add_del_cb(struct dcb *dcb, struct nlmsghdr *nlh, void *data)
{
struct dcb_app_add_del *add_del = data;
struct nlattr *nest;
size_t i;
nest = mnl_attr_nest_start(nlh, DCB_ATTR_IEEE_APP_TABLE);
for (i = 0; i < add_del->tab->n_apps; i++) {
const struct dcb_app *app = &add_del->tab->apps[i];
if (add_del->filter == NULL || add_del->filter(app))
mnl_attr_put(nlh, DCB_ATTR_IEEE_APP, sizeof(*app), app);
}
mnl_attr_nest_end(nlh, nest);
return 0;
}
static int dcb_app_add_del(struct dcb *dcb, const char *dev, int command,
const struct dcb_app_table *tab,
bool (*filter)(const struct dcb_app *))
{
struct dcb_app_add_del add_del = {
.tab = tab,
.filter = filter,
};
if (tab->n_apps == 0)
return 0;
return dcb_set_attribute_va(dcb, command, dev, dcb_app_add_del_cb, &add_del);
}
static int dcb_cmd_app_parse_add_del(struct dcb *dcb, const char *dev,
int argc, char **argv, struct dcb_app_table *tab)
{
struct dcb_app_parse_mapping pm = {
.tab = tab,
};
int ret;
if (!argc) {
dcb_app_help_add();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_add();
return 0;
} else if (matches(*argv, "ethtype-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ETHERTYPE;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_ethtype_prio,
&pm);
} else if (matches(*argv, "default-prio") == 0) {
NEXT_ARG();
ret = dcb_app_parse_default_prio(&argc, &argv, pm.tab);
if (ret != 0) {
fprintf(stderr, "Invalid default priority %s\n", *argv);
return ret;
}
} else if (matches(*argv, "dscp-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DSCP;
ret = parse_mapping_gen(&argc, &argv,
&dcb_app_parse_dscp,
&dcb_app_parse_mapping_dscp_prio,
&pm);
} else if (matches(*argv, "stream-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_STREAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "dgram-port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_DGRAM;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else if (matches(*argv, "port-prio") == 0) {
NEXT_ARG();
pm.selector = IEEE_8021QAZ_APP_SEL_ANY;
ret = parse_mapping(&argc, &argv, false,
&dcb_app_parse_mapping_port_prio,
&pm);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_add();
return -EINVAL;
}
if (ret != 0) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
if (pm.err)
return pm.err;
} while (argc > 0);
return 0;
}
static int dcb_cmd_app_add(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_del(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
return ret;
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
dcb_app_table_sort(&tab);
open_json_object(NULL);
if (!argc) {
dcb_app_print(dcb, &tab);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
dcb_app_print_ethtype_prio(&tab);
} else if (matches(*argv, "dscp-prio") == 0) {
dcb_app_print_dscp_prio(dcb, &tab);
} else if (matches(*argv, "stream-port-prio") == 0) {
dcb_app_print_stream_port_prio(&tab);
} else if (matches(*argv, "dgram-port-prio") == 0) {
dcb_app_print_dgram_port_prio(&tab);
} else if (matches(*argv, "port-prio") == 0) {
dcb_app_print_port_prio(&tab);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_flush(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table tab = {};
int ret;
ret = dcb_app_get(dcb, dev, &tab);
if (ret != 0)
return ret;
if (!argc) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab, NULL);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_app_help_show_flush();
goto out;
} else if (matches(*argv, "ethtype-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_ethtype);
if (ret != 0)
goto out;
} else if (matches(*argv, "default-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_default);
if (ret != 0)
goto out;
} else if (matches(*argv, "dscp-prio") == 0) {
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &tab,
&dcb_app_is_dscp);
if (ret != 0)
goto out;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help_show_flush();
ret = -EINVAL;
goto out;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
dcb_app_table_fini(&tab);
return ret;
}
static int dcb_cmd_app_replace(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcb_app_table orig = {};
struct dcb_app_table tab = {};
struct dcb_app_table new = {};
int ret;
ret = dcb_app_get(dcb, dev, &orig);
if (ret != 0)
return ret;
ret = dcb_cmd_app_parse_add_del(dcb, dev, argc, argv, &tab);
if (ret != 0)
goto out;
/* Attempts to add an existing entry would be rejected, so drop
* these entries from tab.
*/
ret = dcb_app_table_copy(&new, &tab);
if (ret != 0)
goto out;
dcb_app_table_remove_existing(&new, &orig);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_SET, &new, NULL);
if (ret != 0) {
fprintf(stderr, "Could not add new APP entries\n");
goto out;
}
/* Remove the obsolete entries. */
dcb_app_table_remove_replaced(&orig, &tab);
ret = dcb_app_add_del(dcb, dev, DCB_CMD_IEEE_DEL, &orig, NULL);
if (ret != 0) {
fprintf(stderr, "Could not remove replaced APP entries\n");
goto out;
}
out:
dcb_app_table_fini(&new);
dcb_app_table_fini(&tab);
dcb_app_table_fini(&orig);
return 0;
}
int dcb_cmd_app(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_app_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_show, dcb_app_help_show_flush);
} else if (matches(*argv, "flush") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_flush, dcb_app_help_show_flush);
} else if (matches(*argv, "add") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_add, dcb_app_help_add);
} else if (matches(*argv, "del") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_del, dcb_app_help_add);
} else if (matches(*argv, "replace") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_app_replace, dcb_app_help_add);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_app_help();
return -EINVAL;
}
}

235
dcb/dcb_buffer.c Normal file
View File

@ -0,0 +1,235 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_buffer_help_set(void)
{
fprintf(stderr,
"Usage: dcb buffer set dev STRING\n"
" [ prio-buffer PRIO-MAP ]\n"
" [ buffer-size SIZE-MAP ]\n"
"\n"
" where PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:BUFFER\n"
" SIZE-MAP := [ SIZE-MAP ] SIZE-MAPPING\n"
" SIZE-MAPPING := { all | BUFFER }:INTEGER\n"
" PRIO := { 0 .. 7 }\n"
" BUFFER := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_buffer_help_show(void)
{
fprintf(stderr,
"Usage: dcb buffer show dev STRING\n"
" [ prio-buffer ] [ buffer-size ] [ total-size ]\n"
"\n"
);
}
static void dcb_buffer_help(void)
{
fprintf(stderr,
"Usage: dcb buffer help\n"
"\n"
);
dcb_buffer_help_show();
dcb_buffer_help_set();
}
static int dcb_buffer_parse_mapping_prio_buffer(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
__u8 buf;
if (get_u8(&buf, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021Q_MAX_PRIORITIES - 1,
"BUFFER", buf, DCBX_MAX_BUFFERS - 1,
dcb_set_u8, buffer->prio2buffer);
}
static int dcb_buffer_parse_mapping_buffer_size(__u32 key, char *value, void *data)
{
struct dcbnl_buffer *buffer = data;
unsigned int size;
if (get_size(&size, value)) {
fprintf(stderr, "%d:%s: Illegal value for buffer size\n", key, value);
return -EINVAL;
}
return dcb_parse_mapping("BUFFER", key, DCBX_MAX_BUFFERS - 1,
"INTEGER", size, -1,
dcb_set_u32, buffer->buffer_size);
}
static void dcb_buffer_print_total_size(const struct dcbnl_buffer *buffer)
{
print_size(PRINT_ANY, "total_size", "total-size %s ", buffer->total_size);
}
static void dcb_buffer_print_prio_buffer(const struct dcbnl_buffer *buffer)
{
dcb_print_named_array("prio_buffer", "prio-buffer",
buffer->prio2buffer, ARRAY_SIZE(buffer->prio2buffer),
dcb_print_array_u8);
}
static void dcb_buffer_print_buffer_size(const struct dcbnl_buffer *buffer)
{
size_t size = ARRAY_SIZE(buffer->buffer_size);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "buffer_size");
print_string(PRINT_FP, NULL, "buffer-size ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_size(PRINT_ANY, NULL, b, buffer->buffer_size[i]);
}
close_json_array(PRINT_JSON, "buffer_size");
}
static void dcb_buffer_print(const struct dcbnl_buffer *buffer)
{
dcb_buffer_print_prio_buffer(buffer);
print_nl();
dcb_buffer_print_buffer_size(buffer);
print_nl();
dcb_buffer_print_total_size(buffer);
print_nl();
}
static int dcb_buffer_get(struct dcb *dcb, const char *dev, struct dcbnl_buffer *buffer)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_buffer_set(struct dcb *dcb, const char *dev, const struct dcbnl_buffer *buffer)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_DCB_BUFFER, buffer, sizeof(*buffer));
}
static int dcb_cmd_buffer_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
if (!argc) {
dcb_buffer_help_set();
return 0;
}
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_set();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_prio_buffer, &buffer);
if (ret) {
fprintf(stderr, "Invalid priority mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "buffer-size") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_buffer_parse_mapping_buffer_size, &buffer);
if (ret) {
fprintf(stderr, "Invalid buffer size mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_buffer_set(dcb, dev, &buffer);
}
static int dcb_cmd_buffer_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct dcbnl_buffer buffer;
int ret;
ret = dcb_buffer_get(dcb, dev, &buffer);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_buffer_print(&buffer);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_buffer_help_show();
return 0;
} else if (matches(*argv, "prio-buffer") == 0) {
dcb_buffer_print_prio_buffer(&buffer);
print_nl();
} else if (matches(*argv, "buffer-size") == 0) {
dcb_buffer_print_buffer_size(&buffer);
print_nl();
} else if (matches(*argv, "total-size") == 0) {
dcb_buffer_print_total_size(&buffer);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_buffer(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_buffer_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_show, dcb_buffer_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_buffer_set, dcb_buffer_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_buffer_help();
return -EINVAL;
}
}

192
dcb/dcb_dcbx.c Normal file
View File

@ -0,0 +1,192 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_dcbx_help_set(void)
{
fprintf(stderr,
"Usage: dcb dcbx set dev STRING\n"
" [ host | lld-managed ]\n"
" [ cee | ieee ] [ static ]\n"
"\n"
);
}
static void dcb_dcbx_help_show(void)
{
fprintf(stderr,
"Usage: dcb dcbx show dev STRING\n"
"\n"
);
}
static void dcb_dcbx_help(void)
{
fprintf(stderr,
"Usage: dcb dcbx help\n"
"\n"
);
dcb_dcbx_help_show();
dcb_dcbx_help_set();
}
struct dcb_dcbx_flag {
__u8 value;
const char *key_fp;
const char *key_json;
};
static struct dcb_dcbx_flag dcb_dcbx_flags[] = {
{DCB_CAP_DCBX_HOST, "host"},
{DCB_CAP_DCBX_LLD_MANAGED, "lld-managed", "lld_managed"},
{DCB_CAP_DCBX_VER_CEE, "cee"},
{DCB_CAP_DCBX_VER_IEEE, "ieee"},
{DCB_CAP_DCBX_STATIC, "static"},
};
static void dcb_dcbx_print(__u8 dcbx)
{
int bit;
int i;
while ((bit = ffs(dcbx))) {
bool found = false;
bit--;
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (flag->value == 1 << bit) {
print_bool(PRINT_JSON, flag->key_json ?: flag->key_fp,
NULL, true);
print_string(PRINT_FP, NULL, "%s ", flag->key_fp);
found = true;
break;
}
}
if (!found)
fprintf(stderr, "Unknown DCBX bit %#x.\n", 1 << bit);
dcbx &= ~(1 << bit);
}
print_nl();
}
static int dcb_dcbx_get(struct dcb *dcb, const char *dev, __u8 *dcbx)
{
__u16 payload_len;
void *payload;
int err;
err = dcb_get_attribute_bare(dcb, DCB_CMD_IEEE_GET, dev, DCB_ATTR_DCBX,
&payload, &payload_len);
if (err != 0)
return err;
if (payload_len != 1) {
fprintf(stderr, "DCB_ATTR_DCBX payload has size %d, expected 1.\n",
payload_len);
return -EINVAL;
}
*dcbx = *(__u8 *) payload;
return 0;
}
static int dcb_dcbx_set(struct dcb *dcb, const char *dev, __u8 dcbx)
{
return dcb_set_attribute_bare(dcb, DCB_CMD_SDCBX, dev, DCB_ATTR_DCBX,
&dcbx, 1, DCB_ATTR_DCBX);
}
static int dcb_cmd_dcbx_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx = 0;
__u8 i;
if (!argc) {
dcb_dcbx_help_set();
return 0;
}
do {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_set();
return 0;
}
for (i = 0; i < ARRAY_SIZE(dcb_dcbx_flags); i++) {
struct dcb_dcbx_flag *flag = &dcb_dcbx_flags[i];
if (matches(*argv, flag->key_fp) == 0) {
dcbx |= flag->value;
NEXT_ARG_FWD();
goto next;
}
}
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_set();
return -EINVAL;
next:
;
} while (argc > 0);
return dcb_dcbx_set(dcb, dev, dcbx);
}
static int dcb_cmd_dcbx_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
__u8 dcbx;
int ret;
ret = dcb_dcbx_get(dcb, dev, &dcbx);
if (ret != 0)
return ret;
while (argc > 0) {
if (matches(*argv, "help") == 0) {
dcb_dcbx_help_show();
return 0;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
}
open_json_object(NULL);
dcb_dcbx_print(dcbx);
close_json_object();
return 0;
}
int dcb_cmd_dcbx(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_dcbx_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_show, dcb_dcbx_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_dcbx_set, dcb_dcbx_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_dcbx_help();
return -EINVAL;
}
}

435
dcb/dcb_ets.c Normal file
View File

@ -0,0 +1,435 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_ets_help_set(void)
{
fprintf(stderr,
"Usage: dcb ets set dev STRING\n"
" [ willing { on | off } ]\n"
" [ { tc-tsa | reco-tc-tsa } TSA-MAP ]\n"
" [ { pg-bw | tc-bw | reco-tc-bw } BW-MAP ]\n"
" [ { prio-tc | reco-prio-tc } PRIO-MAP ]\n"
"\n"
" where TSA-MAP := [ TSA-MAP ] TSA-MAPPING\n"
" TSA-MAPPING := { all | TC }:{ strict | cbs | ets | vendor }\n"
" BW-MAP := [ BW-MAP ] BW-MAPPING\n"
" BW-MAPPING := { all | TC }:INTEGER\n"
" PRIO-MAP := [ PRIO-MAP ] PRIO-MAPPING\n"
" PRIO-MAPPING := { all | PRIO }:TC\n"
" TC := { 0 .. 7 }\n"
" PRIO := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_ets_help_show(void)
{
fprintf(stderr,
"Usage: dcb ets show dev STRING\n"
" [ willing ] [ ets-cap ] [ cbs ] [ tc-tsa ]\n"
" [ reco-tc-tsa ] [ pg-bw ] [ tc-bw ] [ reco-tc-bw ]\n"
" [ prio-tc ] [ reco-prio-tc ]\n"
"\n"
);
}
static void dcb_ets_help(void)
{
fprintf(stderr,
"Usage: dcb ets help\n"
"\n"
);
dcb_ets_help_show();
dcb_ets_help_set();
}
static const char *const tsa_names[] = {
[IEEE_8021QAZ_TSA_STRICT] = "strict",
[IEEE_8021QAZ_TSA_CB_SHAPER] = "cbs",
[IEEE_8021QAZ_TSA_ETS] = "ets",
[IEEE_8021QAZ_TSA_VENDOR] = "vendor",
};
static int dcb_ets_parse_mapping_tc_tsa(__u32 key, char *value, void *data)
{
__u8 tsa;
int ret;
tsa = parse_one_of("TSA", value, tsa_names, ARRAY_SIZE(tsa_names), &ret);
if (ret)
return ret;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"TSA", tsa, -1U,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_tc_bw(__u32 key, char *value, void *data)
{
__u8 bw;
if (get_u8(&bw, value, 0))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"BW", bw, 100,
dcb_set_u8, data);
}
static int dcb_ets_parse_mapping_prio_tc(unsigned int key, char *value, void *data)
{
__u8 tc;
if (get_u8(&tc, value, 0))
return -EINVAL;
return dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"TC", tc, IEEE_8021QAZ_MAX_TCS - 1,
dcb_set_u8, data);
}
static void dcb_print_array_tsa(const __u8 *array, size_t size)
{
dcb_print_array_kw(array, size, tsa_names, ARRAY_SIZE(tsa_names));
}
static void dcb_ets_print_willing(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "willing", "willing %s ", ets->willing);
}
static void dcb_ets_print_ets_cap(const struct ieee_ets *ets)
{
print_uint(PRINT_ANY, "ets_cap", "ets-cap %d ", ets->ets_cap);
}
static void dcb_ets_print_cbs(const struct ieee_ets *ets)
{
print_on_off(PRINT_ANY, "cbs", "cbs %s ", ets->cbs);
}
static void dcb_ets_print_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_bw", "tc-bw",
ets->tc_tx_bw, ARRAY_SIZE(ets->tc_tx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_pg_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("pg_bw", "pg-bw",
ets->tc_rx_bw, ARRAY_SIZE(ets->tc_rx_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("tc_tsa", "tc-tsa",
ets->tc_tsa, ARRAY_SIZE(ets->tc_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("prio_tc", "prio-tc",
ets->prio_tc, ARRAY_SIZE(ets->prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_bw(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_bw", "reco-tc-bw",
ets->tc_reco_bw, ARRAY_SIZE(ets->tc_reco_bw),
dcb_print_array_u8);
}
static void dcb_ets_print_reco_tc_tsa(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_tc_tsa", "reco-tc-tsa",
ets->tc_reco_tsa, ARRAY_SIZE(ets->tc_reco_tsa),
dcb_print_array_tsa);
}
static void dcb_ets_print_reco_prio_tc(const struct ieee_ets *ets)
{
dcb_print_named_array("reco_prio_tc", "reco-prio-tc",
ets->reco_prio_tc, ARRAY_SIZE(ets->reco_prio_tc),
dcb_print_array_u8);
}
static void dcb_ets_print(const struct ieee_ets *ets)
{
dcb_ets_print_willing(ets);
dcb_ets_print_ets_cap(ets);
dcb_ets_print_cbs(ets);
print_nl();
dcb_ets_print_tc_bw(ets);
print_nl();
dcb_ets_print_pg_bw(ets);
print_nl();
dcb_ets_print_tc_tsa(ets);
print_nl();
dcb_ets_print_prio_tc(ets);
print_nl();
dcb_ets_print_reco_tc_bw(ets);
print_nl();
dcb_ets_print_reco_tc_tsa(ets);
print_nl();
dcb_ets_print_reco_prio_tc(ets);
print_nl();
}
static int dcb_ets_get(struct dcb *dcb, const char *dev, struct ieee_ets *ets)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_ets_validate_bw(const __u8 bw[], const __u8 tsa[], const char *what)
{
bool has_ets = false;
unsigned int total = 0;
unsigned int tc;
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (tsa[tc] == IEEE_8021QAZ_TSA_ETS) {
has_ets = true;
break;
}
}
/* TC bandwidth is only intended for ETS, but 802.1Q-2018 only requires
* that the sum be 100, and individual entries 0..100. It explicitly
* notes that non-ETS TCs can have non-0 TC bandwidth during
* reconfiguration.
*/
for (tc = 0; tc < IEEE_8021QAZ_MAX_TCS; tc++) {
if (bw[tc] > 100) {
fprintf(stderr, "%d%% for TC %d of %s is not a valid bandwidth percentage, expected 0..100%%\n",
bw[tc], tc, what);
return -EINVAL;
}
total += bw[tc];
}
/* This is what 802.1Q-2018 requires. */
if (total == 100)
return 0;
/* But this requirement does not make sense for all-strict
* configurations. Anything else than 0 does not make sense: either BW
* has not been reconfigured for the all-strict allocation yet, at which
* point we expect sum of 100. Or it has already been reconfigured, at
* which point accept 0.
*/
if (!has_ets && total == 0)
return 0;
fprintf(stderr, "Bandwidth percentages in %s sum to %d%%, expected %d%%\n",
what, total, has_ets ? 100 : 0);
return -EINVAL;
}
static int dcb_ets_set(struct dcb *dcb, const char *dev, const struct ieee_ets *ets)
{
/* Do not validate pg-bw, which is not standard and has unclear
* meaning.
*/
if (dcb_ets_validate_bw(ets->tc_tx_bw, ets->tc_tsa, "tc-bw") ||
dcb_ets_validate_bw(ets->tc_reco_bw, ets->tc_reco_tsa, "reco-tc-bw"))
return -EINVAL;
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_ETS, ets, sizeof(*ets));
}
static int dcb_cmd_ets_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
if (!argc) {
dcb_ets_help_set();
return 1;
}
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_set();
return 0;
} else if (matches(*argv, "willing") == 0) {
NEXT_ARG();
ets.willing = parse_on_off("willing", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_tsa);
if (ret) {
fprintf(stderr, "Invalid tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-tsa") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_tsa,
ets.tc_reco_tsa);
if (ret) {
fprintf(stderr, "Invalid reco-tc-tsa mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_tx_bw);
if (ret) {
fprintf(stderr, "Invalid tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "pg-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_rx_bw);
if (ret) {
fprintf(stderr, "Invalid pg-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-tc-bw") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_tc_bw,
ets.tc_reco_bw);
if (ret) {
fprintf(stderr, "Invalid reco-tc-bw mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.prio_tc);
if (ret) {
fprintf(stderr, "Invalid prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "reco-prio-tc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true, &dcb_ets_parse_mapping_prio_tc,
ets.reco_prio_tc);
if (ret) {
fprintf(stderr, "Invalid reco-prio-tc mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_ets_set(dcb, dev, &ets);
}
static int dcb_cmd_ets_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_ets ets;
int ret;
ret = dcb_ets_get(dcb, dev, &ets);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_ets_print(&ets);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_ets_help_show();
return 0;
} else if (matches(*argv, "willing") == 0) {
dcb_ets_print_willing(&ets);
print_nl();
} else if (matches(*argv, "ets-cap") == 0) {
dcb_ets_print_ets_cap(&ets);
print_nl();
} else if (matches(*argv, "cbs") == 0) {
dcb_ets_print_cbs(&ets);
print_nl();
} else if (matches(*argv, "tc-tsa") == 0) {
dcb_ets_print_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-tsa") == 0) {
dcb_ets_print_reco_tc_tsa(&ets);
print_nl();
} else if (matches(*argv, "tc-bw") == 0) {
dcb_ets_print_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "pg-bw") == 0) {
dcb_ets_print_pg_bw(&ets);
print_nl();
} else if (matches(*argv, "reco-tc-bw") == 0) {
dcb_ets_print_reco_tc_bw(&ets);
print_nl();
} else if (matches(*argv, "prio-tc") == 0) {
dcb_ets_print_prio_tc(&ets);
print_nl();
} else if (matches(*argv, "reco-prio-tc") == 0) {
dcb_ets_print_reco_prio_tc(&ets);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_ets(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_ets_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_show, dcb_ets_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv, dcb_cmd_ets_set, dcb_ets_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_ets_help();
return -EINVAL;
}
}

182
dcb/dcb_maxrate.c Normal file
View File

@ -0,0 +1,182 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_maxrate_help_set(void)
{
fprintf(stderr,
"Usage: dcb maxrate set dev STRING\n"
" [ tc-maxrate RATE-MAP ]\n"
"\n"
" where RATE-MAP := [ RATE-MAP ] RATE-MAPPING\n"
" RATE-MAPPING := { all | TC }:RATE\n"
" TC := { 0 .. 7 }\n"
"\n"
);
}
static void dcb_maxrate_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -i ] maxrate show dev STRING\n"
" [ tc-maxrate ]\n"
"\n"
);
}
static void dcb_maxrate_help(void)
{
fprintf(stderr,
"Usage: dcb maxrate help\n"
"\n"
);
dcb_maxrate_help_show();
dcb_maxrate_help_set();
}
static int dcb_maxrate_parse_mapping_tc_maxrate(__u32 key, char *value, void *data)
{
__u64 rate;
if (get_rate64(&rate, value))
return -EINVAL;
return dcb_parse_mapping("TC", key, IEEE_8021QAZ_MAX_TCS - 1,
"RATE", rate, -1,
dcb_set_u64, data);
}
static void dcb_maxrate_print_tc_maxrate(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
size_t size = ARRAY_SIZE(maxrate->tc_maxrate);
SPRINT_BUF(b);
size_t i;
open_json_array(PRINT_JSON, "tc_maxrate");
print_string(PRINT_FP, NULL, "tc-maxrate ", NULL);
for (i = 0; i < size; i++) {
snprintf(b, sizeof(b), "%zd:%%s ", i);
print_rate(dcb->use_iec, PRINT_ANY, NULL, b, maxrate->tc_maxrate[i]);
}
close_json_array(PRINT_JSON, "tc_maxrate");
}
static void dcb_maxrate_print(struct dcb *dcb, const struct ieee_maxrate *maxrate)
{
dcb_maxrate_print_tc_maxrate(dcb, maxrate);
print_nl();
}
static int dcb_maxrate_get(struct dcb *dcb, const char *dev, struct ieee_maxrate *maxrate)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_maxrate_set(struct dcb *dcb, const char *dev, const struct ieee_maxrate *maxrate)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_MAXRATE, maxrate, sizeof(*maxrate));
}
static int dcb_cmd_maxrate_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
if (!argc) {
dcb_maxrate_help_set();
return 0;
}
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_set();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_maxrate_parse_mapping_tc_maxrate, &maxrate);
if (ret) {
fprintf(stderr, "Invalid mapping %s\n", *argv);
return ret;
}
continue;
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_maxrate_set(dcb, dev, &maxrate);
}
static int dcb_cmd_maxrate_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_maxrate maxrate;
int ret;
ret = dcb_maxrate_get(dcb, dev, &maxrate);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_maxrate_print(dcb, &maxrate);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_maxrate_help_show();
return 0;
} else if (matches(*argv, "tc-maxrate") == 0) {
dcb_maxrate_print_tc_maxrate(dcb, &maxrate);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_maxrate(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_maxrate_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_show, dcb_maxrate_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_maxrate_set, dcb_maxrate_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_maxrate_help();
return -EINVAL;
}
}

286
dcb/dcb_pfc.c Normal file
View File

@ -0,0 +1,286 @@
// SPDX-License-Identifier: GPL-2.0+
#include <errno.h>
#include <stdio.h>
#include <linux/dcbnl.h>
#include "dcb.h"
#include "utils.h"
static void dcb_pfc_help_set(void)
{
fprintf(stderr,
"Usage: dcb pfc set dev STRING\n"
" [ prio-pfc PFC-MAP ]\n"
" [ macsec-bypass { on | off } ]\n"
" [ delay INTEGER ]\n"
"\n"
" where PFC-MAP := [ PFC-MAP ] PFC-MAPPING\n"
" PFC-MAPPING := { all | TC }:PFC\n"
" TC := { 0 .. 7 }\n"
" PFC := { on | off }\n"
"\n"
);
}
static void dcb_pfc_help_show(void)
{
fprintf(stderr,
"Usage: dcb [ -s ] pfc show dev STRING\n"
" [ pfc-cap ] [ prio-pfc ] [ macsec-bypass ]\n"
" [ delay ] [ requests ] [ indications ]\n"
"\n"
);
}
static void dcb_pfc_help(void)
{
fprintf(stderr,
"Usage: dcb pfc help\n"
"\n"
);
dcb_pfc_help_show();
dcb_pfc_help_set();
}
static void dcb_pfc_to_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 pfc_en)
{
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
array[i] = !!(pfc_en & (1 << i));
}
static void dcb_pfc_from_array(__u8 array[IEEE_8021QAZ_MAX_TCS], __u8 *pfc_en_p)
{
__u8 pfc_en = 0;
int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
if (array[i])
pfc_en |= 1 << i;
}
*pfc_en_p = pfc_en;
}
static int dcb_pfc_parse_mapping_prio_pfc(__u32 key, char *value, void *data)
{
struct ieee_pfc *pfc = data;
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
bool enabled;
int ret;
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
enabled = parse_on_off("PFC", value, &ret);
if (ret)
return ret;
ret = dcb_parse_mapping("PRIO", key, IEEE_8021QAZ_MAX_TCS - 1,
"PFC", enabled, -1,
dcb_set_u8, pfc_en);
if (ret)
return ret;
dcb_pfc_from_array(pfc_en, &pfc->pfc_en);
return 0;
}
static void dcb_pfc_print_pfc_cap(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "pfc_cap", "pfc-cap %d ", pfc->pfc_cap);
}
static void dcb_pfc_print_macsec_bypass(const struct ieee_pfc *pfc)
{
print_on_off(PRINT_ANY, "macsec_bypass", "macsec-bypass %s ", pfc->mbc);
}
static void dcb_pfc_print_delay(const struct ieee_pfc *pfc)
{
print_uint(PRINT_ANY, "delay", "delay %d ", pfc->delay);
}
static void dcb_pfc_print_prio_pfc(const struct ieee_pfc *pfc)
{
__u8 pfc_en[IEEE_8021QAZ_MAX_TCS];
dcb_pfc_to_array(pfc_en, pfc->pfc_en);
dcb_print_named_array("prio_pfc", "prio-pfc",
pfc_en, ARRAY_SIZE(pfc_en), &dcb_print_array_on_off);
}
static void dcb_pfc_print_requests(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "requests");
print_string(PRINT_FP, NULL, "requests ", NULL);
dcb_print_array_u64(pfc->requests, ARRAY_SIZE(pfc->requests));
close_json_array(PRINT_JSON, "requests");
}
static void dcb_pfc_print_indications(const struct ieee_pfc *pfc)
{
open_json_array(PRINT_JSON, "indications");
print_string(PRINT_FP, NULL, "indications ", NULL);
dcb_print_array_u64(pfc->indications, ARRAY_SIZE(pfc->indications));
close_json_array(PRINT_JSON, "indications");
}
static void dcb_pfc_print(const struct dcb *dcb, const struct ieee_pfc *pfc)
{
dcb_pfc_print_pfc_cap(pfc);
dcb_pfc_print_macsec_bypass(pfc);
dcb_pfc_print_delay(pfc);
print_nl();
dcb_pfc_print_prio_pfc(pfc);
print_nl();
if (dcb->stats) {
dcb_pfc_print_requests(pfc);
print_nl();
dcb_pfc_print_indications(pfc);
print_nl();
}
}
static int dcb_pfc_get(struct dcb *dcb, const char *dev, struct ieee_pfc *pfc)
{
return dcb_get_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_pfc_set(struct dcb *dcb, const char *dev, const struct ieee_pfc *pfc)
{
return dcb_set_attribute(dcb, dev, DCB_ATTR_IEEE_PFC, pfc, sizeof(*pfc));
}
static int dcb_cmd_pfc_set(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
if (!argc) {
dcb_pfc_help_set();
return 0;
}
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_set();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
NEXT_ARG();
ret = parse_mapping(&argc, &argv, true,
&dcb_pfc_parse_mapping_prio_pfc, &pfc);
if (ret) {
fprintf(stderr, "Invalid pfc mapping %s\n", *argv);
return ret;
}
continue;
} else if (matches(*argv, "macsec-bypass") == 0) {
NEXT_ARG();
pfc.mbc = parse_on_off("macsec-bypass", *argv, &ret);
if (ret)
return ret;
} else if (matches(*argv, "delay") == 0) {
NEXT_ARG();
/* Do not support the size notations for delay.
* Delay is specified in "bit times", not bits, so
* it is not applicable. At the same time it would
* be confusing that 10Kbit does not mean 10240,
* but 1280.
*/
if (get_u16(&pfc.delay, *argv, 0)) {
fprintf(stderr, "Invalid delay `%s', expected an integer 0..65535\n",
*argv);
return -EINVAL;
}
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_set();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
return dcb_pfc_set(dcb, dev, &pfc);
}
static int dcb_cmd_pfc_show(struct dcb *dcb, const char *dev, int argc, char **argv)
{
struct ieee_pfc pfc;
int ret;
ret = dcb_pfc_get(dcb, dev, &pfc);
if (ret)
return ret;
open_json_object(NULL);
if (!argc) {
dcb_pfc_print(dcb, &pfc);
goto out;
}
do {
if (matches(*argv, "help") == 0) {
dcb_pfc_help_show();
return 0;
} else if (matches(*argv, "prio-pfc") == 0) {
dcb_pfc_print_prio_pfc(&pfc);
print_nl();
} else if (matches(*argv, "pfc-cap") == 0) {
dcb_pfc_print_pfc_cap(&pfc);
print_nl();
} else if (matches(*argv, "macsec-bypass") == 0) {
dcb_pfc_print_macsec_bypass(&pfc);
print_nl();
} else if (matches(*argv, "delay") == 0) {
dcb_pfc_print_delay(&pfc);
print_nl();
} else if (matches(*argv, "requests") == 0) {
dcb_pfc_print_requests(&pfc);
print_nl();
} else if (matches(*argv, "indications") == 0) {
dcb_pfc_print_indications(&pfc);
print_nl();
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help_show();
return -EINVAL;
}
NEXT_ARG_FWD();
} while (argc > 0);
out:
close_json_object();
return 0;
}
int dcb_cmd_pfc(struct dcb *dcb, int argc, char **argv)
{
if (!argc || matches(*argv, "help") == 0) {
dcb_pfc_help();
return 0;
} else if (matches(*argv, "show") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_show, dcb_pfc_help_show);
} else if (matches(*argv, "set") == 0) {
NEXT_ARG_FWD();
return dcb_cmd_parse_dev(dcb, argc, argv,
dcb_cmd_pfc_set, dcb_pfc_help_set);
} else {
fprintf(stderr, "What is \"%s\"?\n", *argv);
dcb_pfc_help();
return -EINVAL;
}
}

View File

@ -7,12 +7,13 @@ ifeq ($(HAVE_MNL),y)
DEVLINKOBJ = devlink.o mnlg.o DEVLINKOBJ = devlink.o mnlg.o
TARGETS += devlink TARGETS += devlink
LDLIBS += -lm
endif endif
all: $(TARGETS) $(LIBS) all: $(TARGETS) $(LIBS)
devlink: $(DEVLINKOBJ) devlink: $(DEVLINKOBJ) $(LIBNETLINK)
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@ $(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
install: all install: all

File diff suppressed because it is too large Load Diff

View File

@ -14,11 +14,11 @@
#include <string.h> #include <string.h>
#include <errno.h> #include <errno.h>
#include <unistd.h> #include <unistd.h>
#include <time.h>
#include <libmnl/libmnl.h> #include <libmnl/libmnl.h>
#include <linux/genetlink.h> #include <linux/genetlink.h>
#include "libnetlink.h" #include "libnetlink.h"
#include "mnl_utils.h"
#include "utils.h" #include "utils.h"
#include "mnlg.h" #include "mnlg.h"
@ -28,97 +28,13 @@ struct mnlg_socket {
uint32_t id; uint32_t id;
uint8_t version; uint8_t version;
unsigned int seq; unsigned int seq;
unsigned int portid;
}; };
static struct nlmsghdr *__mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd, int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh)
uint16_t flags, uint32_t id,
uint8_t version)
{
struct nlmsghdr *nlh;
struct genlmsghdr *genl;
nlh = mnl_nlmsg_put_header(nlg->buf);
nlh->nlmsg_type = id;
nlh->nlmsg_flags = flags;
nlg->seq = time(NULL);
nlh->nlmsg_seq = nlg->seq;
genl = mnl_nlmsg_put_extra_header(nlh, sizeof(struct genlmsghdr));
genl->cmd = cmd;
genl->version = version;
return nlh;
}
struct nlmsghdr *mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
uint16_t flags)
{
return __mnlg_msg_prepare(nlg, cmd, flags, nlg->id, nlg->version);
}
int mnlg_socket_send(struct mnlg_socket *nlg, const struct nlmsghdr *nlh)
{ {
return mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len); return mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len);
} }
static int mnlg_cb_noop(const struct nlmsghdr *nlh, void *data)
{
return MNL_CB_OK;
}
static int mnlg_cb_error(const struct nlmsghdr *nlh, void *data)
{
const struct nlmsgerr *err = mnl_nlmsg_get_payload(nlh);
/* Netlink subsystems returns the errno value with different signess */
if (err->error < 0)
errno = -err->error;
else
errno = err->error;
if (nl_dump_ext_ack(nlh, NULL))
return MNL_CB_ERROR;
return err->error == 0 ? MNL_CB_STOP : MNL_CB_ERROR;
}
static int mnlg_cb_stop(const struct nlmsghdr *nlh, void *data)
{
int len = *(int *)NLMSG_DATA(nlh);
if (len < 0) {
errno = -len;
nl_dump_ext_ack_done(nlh, len);
return MNL_CB_ERROR;
}
return MNL_CB_STOP;
}
static mnl_cb_t mnlg_cb_array[NLMSG_MIN_TYPE] = {
[NLMSG_NOOP] = mnlg_cb_noop,
[NLMSG_ERROR] = mnlg_cb_error,
[NLMSG_DONE] = mnlg_cb_stop,
[NLMSG_OVERRUN] = mnlg_cb_noop,
};
int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data)
{
int err;
do {
err = mnl_socket_recvfrom(nlg->nl, nlg->buf,
MNL_SOCKET_BUFFER_SIZE);
if (err <= 0)
break;
err = mnl_cb_run2(nlg->buf, err, nlg->seq, nlg->portid,
data_cb, data, mnlg_cb_array,
ARRAY_SIZE(mnlg_cb_array));
} while (err > 0);
return err;
}
struct group_info { struct group_info {
bool found; bool found;
uint32_t id; uint32_t id;
@ -198,15 +114,17 @@ static int get_group_id_cb(const struct nlmsghdr *nlh, void *data)
return MNL_CB_OK; return MNL_CB_OK;
} }
int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name) int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name)
{ {
struct nlmsghdr *nlh; struct nlmsghdr *nlh;
struct group_info group_info; struct group_info group_info;
int err; int err;
nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY, nlh = _mnlu_gen_socket_cmd_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK, GENL_ID_CTRL, 1); NLM_F_REQUEST | NLM_F_ACK,
mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->id); GENL_ID_CTRL, 1);
mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->family);
err = mnlg_socket_send(nlg, nlh); err = mnlg_socket_send(nlg, nlh);
if (err < 0) if (err < 0)
@ -214,7 +132,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
group_info.found = false; group_info.found = false;
group_info.name = group_name; group_info.name = group_name;
err = mnlg_socket_recv_run(nlg, get_group_id_cb, &group_info); err = mnlu_gen_socket_recv_run(nlg, get_group_id_cb, &group_info);
if (err < 0) if (err < 0)
return err; return err;
@ -231,97 +149,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
return 0; return 0;
} }
static int get_family_id_attr_cb(const struct nlattr *attr, void *data) int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg)
{
const struct nlattr **tb = data;
int type = mnl_attr_get_type(attr);
if (mnl_attr_type_valid(attr, CTRL_ATTR_MAX) < 0)
return MNL_CB_ERROR;
if (type == CTRL_ATTR_FAMILY_ID &&
mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
return MNL_CB_ERROR;
tb[type] = attr;
return MNL_CB_OK;
}
static int get_family_id_cb(const struct nlmsghdr *nlh, void *data)
{
uint32_t *p_id = data;
struct nlattr *tb[CTRL_ATTR_MAX + 1] = {};
struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
mnl_attr_parse(nlh, sizeof(*genl), get_family_id_attr_cb, tb);
if (!tb[CTRL_ATTR_FAMILY_ID])
return MNL_CB_ERROR;
*p_id = mnl_attr_get_u16(tb[CTRL_ATTR_FAMILY_ID]);
return MNL_CB_OK;
}
struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version)
{
struct mnlg_socket *nlg;
struct nlmsghdr *nlh;
int one = 1;
int err;
nlg = malloc(sizeof(*nlg));
if (!nlg)
return NULL;
nlg->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
if (!nlg->buf)
goto err_buf_alloc;
nlg->nl = mnl_socket_open(NETLINK_GENERIC);
if (!nlg->nl)
goto err_mnl_socket_open;
/* Older kernels may no support capped/extended ACK reporting */
mnl_socket_setsockopt(nlg->nl, NETLINK_CAP_ACK, &one, sizeof(one));
mnl_socket_setsockopt(nlg->nl, NETLINK_EXT_ACK, &one, sizeof(one));
err = mnl_socket_bind(nlg->nl, 0, MNL_SOCKET_AUTOPID);
if (err < 0)
goto err_mnl_socket_bind;
nlg->portid = mnl_socket_get_portid(nlg->nl);
nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY,
NLM_F_REQUEST | NLM_F_ACK, GENL_ID_CTRL, 1);
mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME, family_name);
err = mnlg_socket_send(nlg, nlh);
if (err < 0)
goto err_mnlg_socket_send;
err = mnlg_socket_recv_run(nlg, get_family_id_cb, &nlg->id);
if (err < 0)
goto err_mnlg_socket_recv_run;
nlg->version = version;
return nlg;
err_mnlg_socket_recv_run:
err_mnlg_socket_send:
err_mnl_socket_bind:
mnl_socket_close(nlg->nl);
err_mnl_socket_open:
free(nlg->buf);
err_buf_alloc:
free(nlg);
return NULL;
}
void mnlg_socket_close(struct mnlg_socket *nlg)
{
mnl_socket_close(nlg->nl);
free(nlg->buf);
free(nlg);
}
int mnlg_socket_get_fd(struct mnlg_socket *nlg)
{ {
return mnl_socket_get_fd(nlg->nl); return mnl_socket_get_fd(nlg->nl);
} }

View File

@ -14,15 +14,10 @@
#include <libmnl/libmnl.h> #include <libmnl/libmnl.h>
struct mnlg_socket; struct mnlu_gen_socket;
struct nlmsghdr *mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd, int mnlg_socket_send(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh);
uint16_t flags); int mnlg_socket_group_add(struct mnlu_gen_socket *nlg, const char *group_name);
int mnlg_socket_send(struct mnlg_socket *nlg, const struct nlmsghdr *nlh); int mnlg_socket_get_fd(struct mnlu_gen_socket *nlg);
int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data);
int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name);
struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version);
void mnlg_socket_close(struct mnlg_socket *nlg);
int mnlg_socket_get_fd(struct mnlg_socket *nlg);
#endif /* _MNLG_H_ */ #endif /* _MNLG_H_ */

View File

@ -10,7 +10,7 @@ Where:
ACTION semantics ACTION semantics
- pass and ok are equivalent to accept - pass and ok are equivalent to accept
- continue allows to restart classification lookup - continue allows one to restart classification lookup
- drop drops packets - drop drops packets
- reclassify implies continue classification where we left off - reclassify implies continue classification where we left off

View File

@ -14,8 +14,10 @@
13 dnrouted 13 dnrouted
14 xorp 14 xorp
15 ntk 15 ntk
16 dhcp 16 dhcp
18 keepalived
42 babel 42 babel
99 openr
186 bgp 186 bgp
187 isis 187 isis
188 ospf 188 ospf

View File

@ -1,8 +1,18 @@
eBPF toy code examples (running in kernel) to familiarize yourself eBPF toy code examples (running in kernel) to familiarize yourself
with syntax and features: with syntax and features:
- bpf_shared.c -> Ingress/egress map sharing example - BTF defined map examples
- bpf_tailcall.c -> Using tail call chains
- bpf_cyclic.c -> Simple cycle as tail calls
- bpf_graft.c -> Demo on altering runtime behaviour - bpf_graft.c -> Demo on altering runtime behaviour
- bpf_map_in_map.c -> Using map in map example - bpf_shared.c -> Ingress/egress map sharing example
- bpf_map_in_map.c -> Using map in map example
- legacy struct bpf_elf_map defined map examples
- legacy/bpf_shared.c -> Ingress/egress map sharing example
- legacy/bpf_tailcall.c -> Using tail call chains
- legacy/bpf_cyclic.c -> Simple cycle as tail calls
- legacy/bpf_graft.c -> Demo on altering runtime behaviour
- legacy/bpf_map_in_map.c -> Using map in map example
Note: Users should use new BTF way to defined the maps, the examples
in legacy folder which is using struct bpf_elf_map defined maps is not
recommanded.

View File

@ -33,13 +33,13 @@
* [...] * [...]
*/ */
struct bpf_elf_map __section_maps jmp_tc = { struct {
.type = BPF_MAP_TYPE_PROG_ARRAY, __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
.size_key = sizeof(uint32_t), __uint(key_size, sizeof(uint32_t));
.size_value = sizeof(uint32_t), __uint(value_size, sizeof(uint32_t));
.pinning = PIN_GLOBAL_NS, __uint(max_entries, 1);
.max_elem = 1, __uint(pinning, LIBBPF_PIN_BY_NAME);
}; } jmp_tc __section(".maps");
__section("aaa") __section("aaa")
int cls_aaa(struct __sk_buff *skb) int cls_aaa(struct __sk_buff *skb)

View File

@ -1,24 +1,23 @@
#include "../../include/bpf_api.h" #include "../../include/bpf_api.h"
#define MAP_INNER_ID 42 struct inner_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(key_size, sizeof(uint32_t));
__uint(value_size, sizeof(uint32_t));
__uint(max_entries, 1);
} map_inner __section(".maps");
struct bpf_elf_map __section_maps map_inner = { struct {
.type = BPF_MAP_TYPE_ARRAY, __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
.size_key = sizeof(uint32_t), __uint(key_size, sizeof(uint32_t));
.size_value = sizeof(uint32_t), __uint(value_size, sizeof(uint32_t));
.id = MAP_INNER_ID, __uint(max_entries, 1);
.inner_idx = 0, __uint(pinning, LIBBPF_PIN_BY_NAME);
.pinning = PIN_GLOBAL_NS, __array(values, struct inner_map);
.max_elem = 1, } map_outer __section(".maps") = {
}; .values = {
[0] = &map_inner,
struct bpf_elf_map __section_maps map_outer = { },
.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.inner_id = MAP_INNER_ID,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
}; };
__section("egress") __section("egress")

View File

@ -18,13 +18,13 @@
* instance is being created. * instance is being created.
*/ */
struct bpf_elf_map __section_maps map_sh = { struct {
.type = BPF_MAP_TYPE_ARRAY, __uint(type, BPF_MAP_TYPE_ARRAY);
.size_key = sizeof(uint32_t), __uint(key_size, sizeof(uint32_t));
.size_value = sizeof(uint32_t), __uint(value_size, sizeof(uint32_t));
.pinning = PIN_OBJECT_NS, /* or PIN_GLOBAL_NS, or PIN_NONE */ __uint(max_entries, 1);
.max_elem = 1, __uint(pinning, LIBBPF_PIN_BY_NAME); /* or LIBBPF_PIN_NONE */
}; } map_sh __section(".maps");
__section("egress") __section("egress")
int emain(struct __sk_buff *skb) int emain(struct __sk_buff *skb)

View File

@ -1,4 +1,4 @@
#include "../../include/bpf_api.h" #include "../../../include/bpf_api.h"
/* Cyclic dependency example to test the kernel's runtime upper /* Cyclic dependency example to test the kernel's runtime upper
* bound on loops. Also demonstrates on how to use direct-actions, * bound on loops. Also demonstrates on how to use direct-actions,

View File

@ -0,0 +1,66 @@
#include "../../../include/bpf_api.h"
/* This example demonstrates how classifier run-time behaviour
* can be altered with tail calls. We start out with an empty
* jmp_tc array, then add section aaa to the array slot 0, and
* later on atomically replace it with section bbb. Note that
* as shown in other examples, the tc loader can prepopulate
* tail called sections, here we start out with an empty one
* on purpose to show it can also be done this way.
*
* tc filter add dev foo parent ffff: bpf obj graft.o
* tc exec bpf dbg
* [...]
* Socket Thread-20229 [001] ..s. 138993.003923: : fallthrough
* <idle>-0 [001] ..s. 138993.202265: : fallthrough
* Socket Thread-20229 [001] ..s. 138994.004149: : fallthrough
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec aaa
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139012.053587: : aaa
* <idle>-0 [002] ..s. 139012.172359: : aaa
* Socket Thread-19818 [001] ..s. 139012.173556: : aaa
* [...]
*
* tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec bbb
* tc exec bpf dbg
* [...]
* Socket Thread-19818 [002] ..s. 139022.102967: : bbb
* <idle>-0 [002] ..s. 139022.155640: : bbb
* Socket Thread-19818 [001] ..s. 139022.156730: : bbb
* [...]
*/
struct bpf_elf_map __section_maps jmp_tc = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("aaa")
int cls_aaa(struct __sk_buff *skb)
{
printt("aaa\n");
return TC_H_MAKE(1, 42);
}
__section("bbb")
int cls_bbb(struct __sk_buff *skb)
{
printt("bbb\n");
return TC_H_MAKE(1, 43);
}
__section_cls_entry
int cls_entry(struct __sk_buff *skb)
{
tail_call(skb, &jmp_tc, 0);
printt("fallthrough\n");
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,56 @@
#include "../../../include/bpf_api.h"
#define MAP_INNER_ID 42
struct bpf_elf_map __section_maps map_inner = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.id = MAP_INNER_ID,
.inner_idx = 0,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
struct bpf_elf_map __section_maps map_outer = {
.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.inner_id = MAP_INNER_ID,
.pinning = PIN_GLOBAL_NS,
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
lock_xadd(val, 1);
}
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
struct bpf_elf_map *map_inner;
int key = 0, *val;
map_inner = map_lookup_elem(&map_outer, &key);
if (map_inner) {
val = map_lookup_elem(map_inner, &key);
if (val)
printt("map val: %d\n", *val);
}
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -0,0 +1,53 @@
#include "../../../include/bpf_api.h"
/* Minimal, stand-alone toy map pinning example:
*
* clang -target bpf -O2 [...] -o bpf_shared.o -c bpf_shared.c
* tc filter add dev foo parent 1: bpf obj bpf_shared.o sec egress
* tc filter add dev foo parent ffff: bpf obj bpf_shared.o sec ingress
*
* Both classifier will share the very same map instance in this example,
* so map content can be accessed from ingress *and* egress side!
*
* This example has a pinning of PIN_OBJECT_NS, so it's private and
* thus shared among various program sections within the object.
*
* A setting of PIN_GLOBAL_NS would place it into a global namespace,
* so that it can be shared among different object files. A setting
* of PIN_NONE (= 0) means no sharing, so each tc invocation a new map
* instance is being created.
*/
struct bpf_elf_map __section_maps map_sh = {
.type = BPF_MAP_TYPE_ARRAY,
.size_key = sizeof(uint32_t),
.size_value = sizeof(uint32_t),
.pinning = PIN_OBJECT_NS, /* or PIN_GLOBAL_NS, or PIN_NONE */
.max_elem = 1,
};
__section("egress")
int emain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
lock_xadd(val, 1);
return BPF_H_DEFAULT;
}
__section("ingress")
int imain(struct __sk_buff *skb)
{
int key = 0, *val;
val = map_lookup_elem(&map_sh, &key);
if (val)
printt("map val: %d\n", *val);
return BPF_H_DEFAULT;
}
BPF_LICENSE("GPL");

View File

@ -1,5 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */ /* SPDX-License-Identifier: GPL-2.0 */
#include "../../include/bpf_api.h" #include "../../../include/bpf_api.h"
#define ENTRY_INIT 3 #define ENTRY_INIT 3
#define ENTRY_0 0 #define ENTRY_0 0

View File

@ -28,13 +28,15 @@
static int usage(void) static int usage(void)
{ {
fprintf(stderr,"Usage: ctrl <CMD>\n" \ fprintf(stderr,"Usage: ctrl <CMD>\n" \
"CMD := get <PARMS> | list | monitor\n" \ "CMD := get <PARMS> | list | monitor | policy <PARMS>\n" \
"PARMS := name <name> | id <id>\n" \ "PARMS := name <name> | id <id>\n" \
"Examples:\n" \ "Examples:\n" \
"\tctrl ls\n" \ "\tctrl ls\n" \
"\tctrl monitor\n" \ "\tctrl monitor\n" \
"\tctrl get name foobar\n" \ "\tctrl get name foobar\n" \
"\tctrl get id 0xF\n"); "\tctrl get id 0xF\n"
"\tctrl policy name foobar\n"
"\tctrl policy id 0xF\n");
return -1; return -1;
} }
@ -123,7 +125,8 @@ static int print_ctrl(struct rtnl_ctrl_data *ctrl,
ghdr->cmd != CTRL_CMD_DELFAMILY && ghdr->cmd != CTRL_CMD_DELFAMILY &&
ghdr->cmd != CTRL_CMD_NEWFAMILY && ghdr->cmd != CTRL_CMD_NEWFAMILY &&
ghdr->cmd != CTRL_CMD_NEWMCAST_GRP && ghdr->cmd != CTRL_CMD_NEWMCAST_GRP &&
ghdr->cmd != CTRL_CMD_DELMCAST_GRP) { ghdr->cmd != CTRL_CMD_DELMCAST_GRP &&
ghdr->cmd != CTRL_CMD_GETPOLICY) {
fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd); fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd);
return 0; return 0;
} }
@ -136,7 +139,7 @@ static int print_ctrl(struct rtnl_ctrl_data *ctrl,
} }
attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN); attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len); parse_rtattr_flags(tb, CTRL_ATTR_MAX, attrs, len, NLA_F_NESTED);
if (tb[CTRL_ATTR_FAMILY_NAME]) { if (tb[CTRL_ATTR_FAMILY_NAME]) {
char *name = RTA_DATA(tb[CTRL_ATTR_FAMILY_NAME]); char *name = RTA_DATA(tb[CTRL_ATTR_FAMILY_NAME]);
@ -159,6 +162,36 @@ static int print_ctrl(struct rtnl_ctrl_data *ctrl,
__u32 *ma = RTA_DATA(tb[CTRL_ATTR_MAXATTR]); __u32 *ma = RTA_DATA(tb[CTRL_ATTR_MAXATTR]);
fprintf(fp, " max attribs: %d ",*ma); fprintf(fp, " max attribs: %d ",*ma);
} }
if (tb[CTRL_ATTR_OP_POLICY]) {
const struct rtattr *pos;
rtattr_for_each_nested(pos, tb[CTRL_ATTR_OP_POLICY]) {
struct rtattr *ptb[CTRL_ATTR_POLICY_DUMP_MAX + 1];
struct rtattr *pattrs = RTA_DATA(pos);
int plen = RTA_PAYLOAD(pos);
parse_rtattr_flags(ptb, CTRL_ATTR_POLICY_DUMP_MAX,
pattrs, plen, NLA_F_NESTED);
fprintf(fp, " op %d policies:",
pos->rta_type & ~NLA_F_NESTED);
if (ptb[CTRL_ATTR_POLICY_DO]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DO]);
fprintf(fp, " do=%d", *v);
}
if (ptb[CTRL_ATTR_POLICY_DUMP]) {
__u32 *v = RTA_DATA(ptb[CTRL_ATTR_POLICY_DUMP]);
fprintf(fp, " dump=%d", *v);
}
}
}
if (tb[CTRL_ATTR_POLICY])
nl_print_policy(tb[CTRL_ATTR_POLICY], fp);
/* end of family definitions .. */ /* end of family definitions .. */
fprintf(fp,"\n"); fprintf(fp,"\n");
if (tb[CTRL_ATTR_OPS]) { if (tb[CTRL_ATTR_OPS]) {
@ -235,7 +268,9 @@ static int ctrl_list(int cmd, int argc, char **argv)
exit(1); exit(1);
} }
if (cmd == CTRL_CMD_GETFAMILY) { if (cmd == CTRL_CMD_GETFAMILY || cmd == CTRL_CMD_GETPOLICY) {
req.g.cmd = cmd;
if (argc != 2) { if (argc != 2) {
fprintf(stderr, "Wrong number of params\n"); fprintf(stderr, "Wrong number of params\n");
return -1; return -1;
@ -260,7 +295,9 @@ static int ctrl_list(int cmd, int argc, char **argv)
fprintf(stderr, "Wrong params\n"); fprintf(stderr, "Wrong params\n");
goto ctrl_done; goto ctrl_done;
} }
}
if (cmd == CTRL_CMD_GETFAMILY) {
if (rtnl_talk(&rth, nlh, &answer) < 0) { if (rtnl_talk(&rth, nlh, &answer) < 0) {
fprintf(stderr, "Error talking to the kernel\n"); fprintf(stderr, "Error talking to the kernel\n");
goto ctrl_done; goto ctrl_done;
@ -273,7 +310,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
} }
if (cmd == CTRL_CMD_UNSPEC) { if (cmd == CTRL_CMD_UNSPEC || cmd == CTRL_CMD_GETPOLICY) {
nlh->nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST; nlh->nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST;
nlh->nlmsg_seq = rth.dump = ++rth.seq; nlh->nlmsg_seq = rth.dump = ++rth.seq;
@ -324,6 +361,8 @@ static int parse_ctrl(struct genl_util *a, int argc, char **argv)
matches(*argv, "show") == 0 || matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0) matches(*argv, "lst") == 0)
return ctrl_list(CTRL_CMD_UNSPEC, argc-1, argv+1); return ctrl_list(CTRL_CMD_UNSPEC, argc-1, argv+1);
if (matches(*argv, "policy") == 0)
return ctrl_list(CTRL_CMD_GETPOLICY, argc-1, argv+1);
if (matches(*argv, "help") == 0) if (matches(*argv, "help") == 0)
return usage(); return usage();

View File

@ -22,7 +22,7 @@
#include <errno.h> #include <errno.h>
#include <linux/netlink.h> #include <linux/netlink.h>
#include <linux/rtnetlink.h> /* until we put our own header */ #include <linux/rtnetlink.h> /* until we put our own header */
#include "SNAPSHOT.h" #include "version.h"
#include "utils.h" #include "utils.h"
#include "genl_utils.h" #include "genl_utils.h"
@ -118,7 +118,7 @@ int main(int argc, char **argv)
} else if (matches(argv[1], "-raw") == 0) { } else if (matches(argv[1], "-raw") == 0) {
++show_raw; ++show_raw;
} else if (matches(argv[1], "-Version") == 0) { } else if (matches(argv[1], "-Version") == 0) {
printf("genl utility, iproute2-ss%s\n", SNAPSHOT); printf("genl utility, iproute2-%s\n", version);
exit(0); exit(0);
} else if (matches(argv[1], "-help") == 0) { } else if (matches(argv[1], "-help") == 0) {
usage(); usage();

View File

@ -2,11 +2,10 @@
#ifndef _TC_UTIL_H_ #ifndef _TC_UTIL_H_
#define _TC_UTIL_H_ 1 #define _TC_UTIL_H_ 1
#include <linux/genetlink.h>
#include "utils.h" #include "utils.h"
#include "linux/genetlink.h"
struct genl_util struct genl_util {
{
struct genl_util *next; struct genl_util *next;
char name[16]; char name[16];
int (*parse_genlopt)(struct genl_util *fu, int argc, char **argv); int (*parse_genlopt)(struct genl_util *fu, int argc, char **argv);

View File

@ -1 +0,0 @@
static const char SNAPSHOT[] = "200602";

View File

@ -19,6 +19,19 @@
#include "bpf_elf.h" #include "bpf_elf.h"
/** libbpf pin type. */
enum libbpf_pin_type {
LIBBPF_PIN_NONE,
/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
LIBBPF_PIN_BY_NAME,
};
/** Type helper macros. */
#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]
/** Misc macros. */ /** Misc macros. */
#ifndef __stringify #ifndef __stringify

View File

@ -274,12 +274,16 @@ int bpf_trace_pipe(void);
void bpf_print_ops(struct rtattr *bpf_ops, __u16 len); void bpf_print_ops(struct rtattr *bpf_ops, __u16 len);
int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns, int bpf_prog_load_dev(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, char *log, size_t size_insns, const char *license, __u32 ifindex,
size_t size_log); char *log, size_t size_log);
int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
size_t size_insns, const char *license, char *log,
size_t size_log);
int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type); int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type); int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type);
int bpf_program_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
int bpf_dump_prog_info(FILE *f, uint32_t id); int bpf_dump_prog_info(FILE *f, uint32_t id);
@ -287,6 +291,16 @@ int bpf_dump_prog_info(FILE *f, uint32_t id);
int bpf_send_map_fds(const char *path, const char *obj); int bpf_send_map_fds(const char *path, const char *obj);
int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux, int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
unsigned int entries); unsigned int entries);
#ifdef HAVE_LIBBPF
int iproute2_bpf_elf_ctx_init(struct bpf_cfg_in *cfg);
int iproute2_bpf_fetch_ancillary(void);
int iproute2_get_root_path(char *root_path, size_t len);
bool iproute2_is_pin_map(const char *libbpf_map_name, char *pathname);
bool iproute2_is_map_in_map(const char *libbpf_map_name, struct bpf_elf_map *imap,
struct bpf_elf_map *omap, char *omap_name);
int iproute2_find_map_name_by_id(unsigned int map_id, char *name);
int iproute2_load_libbpf(struct bpf_cfg_in *cfg);
#endif /* HAVE_LIBBPF */
#else #else
static inline int bpf_send_map_fds(const char *path, const char *obj) static inline int bpf_send_map_fds(const char *path, const char *obj)
{ {
@ -299,5 +313,15 @@ static inline int bpf_recv_map_fds(const char *path, int *fds,
{ {
return -1; return -1;
} }
#ifdef HAVE_LIBBPF
static inline int iproute2_load_libbpf(struct bpf_cfg_in *cfg)
{
fprintf(stderr, "No ELF library support compiled in.\n");
return -1;
}
#endif /* HAVE_LIBBPF */
#endif /* HAVE_ELF */ #endif /* HAVE_ELF */
const char *get_libbpf_version(void);
#endif /* __BPF_UTIL__ */ #endif /* __BPF_UTIL__ */

6
include/cg_map.h Normal file
View File

@ -0,0 +1,6 @@
#ifndef __CG_MAP_H__
#define __CG_MAP_H__
const char *cg_id_to_path(__u64 id);
#endif /* __CG_MAP_H__ */

View File

@ -15,6 +15,9 @@
#include "json_writer.h" #include "json_writer.h"
#include "color.h" #include "color.h"
#define _IS_JSON_CONTEXT(type) (is_json_context() && (type & PRINT_JSON || type & PRINT_ANY))
#define _IS_FP_CONTEXT(type) (!is_json_context() && (type & PRINT_FP || type & PRINT_ANY))
json_writer_t *get_json_writer(void); json_writer_t *get_json_writer(void);
/* /*
@ -65,9 +68,11 @@ void print_nl(void);
_PRINT_FUNC(int, int) _PRINT_FUNC(int, int)
_PRINT_FUNC(s64, int64_t) _PRINT_FUNC(s64, int64_t)
_PRINT_FUNC(bool, bool) _PRINT_FUNC(bool, bool)
_PRINT_FUNC(on_off, bool)
_PRINT_FUNC(null, const char*) _PRINT_FUNC(null, const char*)
_PRINT_FUNC(string, const char*) _PRINT_FUNC(string, const char*)
_PRINT_FUNC(uint, unsigned int) _PRINT_FUNC(uint, unsigned int)
_PRINT_FUNC(size, __u32)
_PRINT_FUNC(u64, uint64_t) _PRINT_FUNC(u64, uint64_t)
_PRINT_FUNC(hhu, unsigned char) _PRINT_FUNC(hhu, unsigned char)
_PRINT_FUNC(hu, unsigned short) _PRINT_FUNC(hu, unsigned short)
@ -76,6 +81,7 @@ _PRINT_FUNC(0xhex, unsigned long long)
_PRINT_FUNC(luint, unsigned long) _PRINT_FUNC(luint, unsigned long)
_PRINT_FUNC(lluint, unsigned long long) _PRINT_FUNC(lluint, unsigned long long)
_PRINT_FUNC(float, double) _PRINT_FUNC(float, double)
_PRINT_FUNC(tv, const struct timeval *)
#undef _PRINT_FUNC #undef _PRINT_FUNC
#define _PRINT_NAME_VALUE_FUNC(type_name, type, format_char) \ #define _PRINT_NAME_VALUE_FUNC(type_name, type, format_char) \
@ -85,4 +91,17 @@ _PRINT_NAME_VALUE_FUNC(uint, unsigned int, u);
_PRINT_NAME_VALUE_FUNC(string, const char*, s); _PRINT_NAME_VALUE_FUNC(string, const char*, s);
#undef _PRINT_NAME_VALUE_FUNC #undef _PRINT_NAME_VALUE_FUNC
int print_color_rate(bool use_iec, enum output_type t, enum color_attr color,
const char *key, const char *fmt, unsigned long long rate);
static inline int print_rate(bool use_iec, enum output_type t,
const char *key, const char *fmt,
unsigned long long rate)
{
return print_color_rate(use_iec, t, COLOR_NONE, key, fmt, rate);
}
/* A backdoor to the size formatter. Please use print_size() instead. */
char *sprint_size(__u32 sz, char *buf);
#endif /* _JSON_PRINT_H_ */ #endif /* _JSON_PRINT_H_ */

View File

@ -21,6 +21,7 @@ struct { \
}, \ }, \
} }
int genl_add_mcast_grp(struct rtnl_handle *grth, __u16 genl_family, const char *group);
int genl_resolve_family(struct rtnl_handle *grth, const char *family); int genl_resolve_family(struct rtnl_handle *grth, const char *family);
int genl_init_handle(struct rtnl_handle *grth, const char *family, int genl_init_handle(struct rtnl_handle *grth, const char *family,
int *genl_family); int *genl_family);

View File

@ -69,6 +69,8 @@ int rtnl_neightbldump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result)); __attribute__((warn_unused_result));
int rtnl_mdbdump_req(struct rtnl_handle *rth, int family) int rtnl_mdbdump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result)); __attribute__((warn_unused_result));
int rtnl_brvlandump_req(struct rtnl_handle *rth, int family, __u32 dump_flags)
__attribute__((warn_unused_result));
int rtnl_netconfdump_req(struct rtnl_handle *rth, int family) int rtnl_netconfdump_req(struct rtnl_handle *rth, int family)
__attribute__((warn_unused_result)); __attribute__((warn_unused_result));
@ -97,6 +99,9 @@ int rtnl_dump_request_n(struct rtnl_handle *rth, struct nlmsghdr *n)
int rtnl_nexthopdump_req(struct rtnl_handle *rth, int family, int rtnl_nexthopdump_req(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn) req_filter_fn_t filter_fn)
__attribute__((warn_unused_result)); __attribute__((warn_unused_result));
int rtnl_nexthop_bucket_dump_req(struct rtnl_handle *rth, int family,
req_filter_fn_t filter_fn)
__attribute__((warn_unused_result));
struct rtnl_ctrl_data { struct rtnl_ctrl_data {
int nsid; int nsid;
@ -104,6 +109,27 @@ struct rtnl_ctrl_data {
typedef int (*rtnl_filter_t)(struct nlmsghdr *n, void *); typedef int (*rtnl_filter_t)(struct nlmsghdr *n, void *);
/**
* rtnl error handler called from
* rtnl_dump_done()
* rtnl_dump_error()
*
* Return value is a bitmask of the following values:
* RTNL_LET_NLERR
* error handled as usual
* RTNL_SUPPRESS_NLMSG_DONE_NLERR
* error in nlmsg_type == NLMSG_DONE will be suppressed
* RTNL_SUPPRESS_NLMSG_ERROR_NLERR
* error in nlmsg_type == NLMSG_ERROR will be suppressed
* and nlmsg will be skipped
* RTNL_SUPPRESS_NLERR - suppress error in both previous cases
*/
#define RTNL_LET_NLERR 0x01
#define RTNL_SUPPRESS_NLMSG_DONE_NLERR 0x02
#define RTNL_SUPPRESS_NLMSG_ERROR_NLERR 0x04
#define RTNL_SUPPRESS_NLERR 0x06
typedef int (*rtnl_err_hndlr_t)(struct nlmsghdr *n, void *);
typedef int (*rtnl_listen_filter_t)(struct rtnl_ctrl_data *, typedef int (*rtnl_listen_filter_t)(struct rtnl_ctrl_data *,
struct nlmsghdr *n, void *); struct nlmsghdr *n, void *);
@ -113,6 +139,8 @@ typedef int (*nl_ext_ack_fn_t)(const char *errmsg, uint32_t off,
struct rtnl_dump_filter_arg { struct rtnl_dump_filter_arg {
rtnl_filter_t filter; rtnl_filter_t filter;
void *arg1; void *arg1;
rtnl_err_hndlr_t errhndlr;
void *arg2;
__u16 nc_flags; __u16 nc_flags;
}; };
@ -121,6 +149,15 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
void *arg, __u16 nc_flags); void *arg, __u16 nc_flags);
#define rtnl_dump_filter(rth, filter, arg) \ #define rtnl_dump_filter(rth, filter, arg) \
rtnl_dump_filter_nc(rth, filter, arg, 0) rtnl_dump_filter_nc(rth, filter, arg, 0)
int rtnl_dump_filter_errhndlr_nc(struct rtnl_handle *rth,
rtnl_filter_t filter,
void *arg1,
rtnl_err_hndlr_t errhndlr,
void *arg2,
__u16 nc_flags);
#define rtnl_dump_filter_errhndlr(rth, filter, farg, errhndlr, earg) \
rtnl_dump_filter_errhndlr_nc(rth, filter, farg, errhndlr, earg, 0)
int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
struct nlmsghdr **answer) struct nlmsghdr **answer)
__attribute__((warn_unused_result)); __attribute__((warn_unused_result));
@ -280,8 +317,20 @@ int rtnl_from_file(FILE *, rtnl_listen_filter_t handler,
((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct if_stats_msg)))) ((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct if_stats_msg))))
#endif #endif
#ifndef BRVLAN_RTA
#define BRVLAN_RTA(r) \
((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct br_vlan_msg))))
#endif
/* User defined nlmsg_type which is used mostly for logging netlink /* User defined nlmsg_type which is used mostly for logging netlink
* messages from dump file */ * messages from dump file */
#define NLMSG_TSTAMP 15 #define NLMSG_TSTAMP 15
#define rtattr_for_each_nested(attr, nest) \
for ((attr) = (void *)RTA_DATA(nest); \
RTA_OK(attr, RTA_PAYLOAD(nest) - ((char *)(attr) - (char *)RTA_DATA((nest)))); \
(attr) = RTA_TAIL((attr)))
void nl_print_policy(const struct rtattr *attr, FILE *fp);
#endif /* __LIBNETLINK_H__ */ #endif /* __LIBNETLINK_H__ */

33
include/mnl_utils.h Normal file
View File

@ -0,0 +1,33 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __MNL_UTILS_H__
#define __MNL_UTILS_H__ 1
struct mnlu_gen_socket {
struct mnl_socket *nl;
char *buf;
uint32_t family;
unsigned int seq;
uint8_t version;
};
int mnlu_gen_socket_open(struct mnlu_gen_socket *nlg, const char *family_name,
uint8_t version);
void mnlu_gen_socket_close(struct mnlu_gen_socket *nlg);
struct nlmsghdr *
_mnlu_gen_socket_cmd_prepare(struct mnlu_gen_socket *nlg,
uint8_t cmd, uint16_t flags,
uint32_t id, uint8_t version);
struct nlmsghdr *mnlu_gen_socket_cmd_prepare(struct mnlu_gen_socket *nlg,
uint8_t cmd, uint16_t flags);
int mnlu_gen_socket_sndrcv(struct mnlu_gen_socket *nlg, const struct nlmsghdr *nlh,
mnl_cb_t data_cb, void *data);
struct mnl_socket *mnlu_socket_open(int bus);
struct nlmsghdr *mnlu_msg_prepare(void *buf, uint32_t nlmsg_type, uint16_t flags,
void *extra_header, size_t extra_header_size);
int mnlu_socket_recv_run(struct mnl_socket *nl, unsigned int seq, void *buf, size_t buf_size,
mnl_cb_t cb, void *data);
int mnlu_gen_socket_recv_run(struct mnlu_gen_socket *nlg, mnl_cb_t cb,
void *data);
#endif /* __MNL_UTILS_H__ */

View File

@ -9,6 +9,7 @@ const char *rtnl_rtscope_n2a(int id, char *buf, int len);
const char *rtnl_rttable_n2a(__u32 id, char *buf, int len); const char *rtnl_rttable_n2a(__u32 id, char *buf, int len);
const char *rtnl_rtrealm_n2a(int id, char *buf, int len); const char *rtnl_rtrealm_n2a(int id, char *buf, int len);
const char *rtnl_dsfield_n2a(int id, char *buf, int len); const char *rtnl_dsfield_n2a(int id, char *buf, int len);
const char *rtnl_dsfield_get_name(int id);
const char *rtnl_group_n2a(int id, char *buf, int len); const char *rtnl_group_n2a(int id, char *buf, int len);
int rtnl_rtprot_a2n(__u32 *id, const char *arg); int rtnl_rtprot_a2n(__u32 *id, const char *arg);
@ -33,6 +34,9 @@ int ll_proto_a2n(unsigned short *id, const char *buf);
const char *nl_proto_n2a(int id, char *buf, int len); const char *nl_proto_n2a(int id, char *buf, int len);
int nl_proto_a2n(__u32 *id, const char *arg); int nl_proto_a2n(__u32 *id, const char *arg);
int protodown_reason_a2n(__u32 *id, const char *arg);
int protodown_reason_n2a(int id, char *buf, int len);
extern int numeric; extern int numeric;
#endif #endif

62
include/uapi/linux/amt.h Normal file
View File

@ -0,0 +1,62 @@
/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
/*
* Copyright (c) 2021 Taehee Yoo <ap420073@gmail.com>
*/
#ifndef _AMT_H_
#define _AMT_H_
enum ifla_amt_mode {
/* AMT interface works as Gateway mode.
* The Gateway mode encapsulates IGMP/MLD traffic and decapsulates
* multicast traffic.
*/
AMT_MODE_GATEWAY = 0,
/* AMT interface works as Relay mode.
* The Relay mode encapsulates multicast traffic and decapsulates
* IGMP/MLD traffic.
*/
AMT_MODE_RELAY,
__AMT_MODE_MAX,
};
#define AMT_MODE_MAX (__AMT_MODE_MAX - 1)
enum {
IFLA_AMT_UNSPEC,
/* This attribute specify mode etier Gateway or Relay. */
IFLA_AMT_MODE,
/* This attribute specify Relay port.
* AMT interface is created as Gateway mode, this attribute is used
* to specify relay(remote) port.
* AMT interface is created as Relay mode, this attribute is used
* as local port.
*/
IFLA_AMT_RELAY_PORT,
/* This attribute specify Gateway port.
* AMT interface is created as Gateway mode, this attribute is used
* as local port.
* AMT interface is created as Relay mode, this attribute is not used.
*/
IFLA_AMT_GATEWAY_PORT,
/* This attribute specify physical device */
IFLA_AMT_LINK,
/* This attribute specify local ip address */
IFLA_AMT_LOCAL_IP,
/* This attribute specify Relay ip address.
* So, this is not used by Relay.
*/
IFLA_AMT_REMOTE_IP,
/* This attribute specify Discovery ip address.
* When Gateway get started, it send discovery message to find the
* Relay's ip address.
* So, this is not used by Relay.
*/
IFLA_AMT_DISCOVERY_IP,
/* This attribute specify number of maximum tunnel. */
IFLA_AMT_MAX_TUNNELS,
__IFLA_AMT_MAX,
};
#define IFLA_AMT_MAX (__IFLA_AMT_MAX - 1)
#endif /* _AMT_H_ */

View File

@ -5,7 +5,7 @@
/* /*
* See http://icawww1.epfl.ch/linux-atm/magic.html for the complete list of * See https://icawww1.epfl.ch/linux-atm/magic.html for the complete list of
* "magic" ioctl numbers. * "magic" ioctl numbers.
*/ */

File diff suppressed because it is too large Load Diff

View File

@ -43,7 +43,7 @@ struct btf_type {
* "size" tells the size of the type it is describing. * "size" tells the size of the type it is describing.
* *
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT, * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC, FUNC_PROTO and VAR. * FUNC, FUNC_PROTO, VAR and DECL_TAG.
* "type" is a type_id referring to another type. * "type" is a type_id referring to another type.
*/ */
union { union {
@ -52,28 +52,33 @@ struct btf_type {
}; };
}; };
#define BTF_INFO_KIND(info) (((info) >> 24) & 0x0f) #define BTF_INFO_KIND(info) (((info) >> 24) & 0x1f)
#define BTF_INFO_VLEN(info) ((info) & 0xffff) #define BTF_INFO_VLEN(info) ((info) & 0xffff)
#define BTF_INFO_KFLAG(info) ((info) >> 31) #define BTF_INFO_KFLAG(info) ((info) >> 31)
#define BTF_KIND_UNKN 0 /* Unknown */ enum {
#define BTF_KIND_INT 1 /* Integer */ BTF_KIND_UNKN = 0, /* Unknown */
#define BTF_KIND_PTR 2 /* Pointer */ BTF_KIND_INT = 1, /* Integer */
#define BTF_KIND_ARRAY 3 /* Array */ BTF_KIND_PTR = 2, /* Pointer */
#define BTF_KIND_STRUCT 4 /* Struct */ BTF_KIND_ARRAY = 3, /* Array */
#define BTF_KIND_UNION 5 /* Union */ BTF_KIND_STRUCT = 4, /* Struct */
#define BTF_KIND_ENUM 6 /* Enumeration */ BTF_KIND_UNION = 5, /* Union */
#define BTF_KIND_FWD 7 /* Forward */ BTF_KIND_ENUM = 6, /* Enumeration */
#define BTF_KIND_TYPEDEF 8 /* Typedef */ BTF_KIND_FWD = 7, /* Forward */
#define BTF_KIND_VOLATILE 9 /* Volatile */ BTF_KIND_TYPEDEF = 8, /* Typedef */
#define BTF_KIND_CONST 10 /* Const */ BTF_KIND_VOLATILE = 9, /* Volatile */
#define BTF_KIND_RESTRICT 11 /* Restrict */ BTF_KIND_CONST = 10, /* Const */
#define BTF_KIND_FUNC 12 /* Function */ BTF_KIND_RESTRICT = 11, /* Restrict */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */ BTF_KIND_FUNC = 12, /* Function */
#define BTF_KIND_VAR 14 /* Variable */ BTF_KIND_FUNC_PROTO = 13, /* Function Proto */
#define BTF_KIND_DATASEC 15 /* Section */ BTF_KIND_VAR = 14, /* Variable */
#define BTF_KIND_MAX BTF_KIND_DATASEC BTF_KIND_DATASEC = 15, /* Section */
#define NR_BTF_KINDS (BTF_KIND_MAX + 1) BTF_KIND_FLOAT = 16, /* Floating point */
BTF_KIND_DECL_TAG = 17, /* Decl Tag */
NR_BTF_KINDS,
BTF_KIND_MAX = NR_BTF_KINDS - 1,
};
/* For some specific BTF_KIND, "struct btf_type" is immediately /* For some specific BTF_KIND, "struct btf_type" is immediately
* followed by extra data. * followed by extra data.
@ -169,4 +174,15 @@ struct btf_var_secinfo {
__u32 size; __u32 size;
}; };
/* BTF_KIND_DECL_TAG is followed by a single "struct btf_decl_tag" to describe
* additional information related to the tag applied location.
* If component_idx == -1, the tag is applied to a struct, union,
* variable or function. Otherwise, it is applied to a struct/union
* member or a func argument, and component_idx indicates which member
* or argument (0 ... vlen-1).
*/
struct btf_decl_tag {
__s32 component_idx;
};
#endif /* __LINUX_BTF_H__ */ #endif /* __LINUX_BTF_H__ */

View File

@ -84,6 +84,7 @@ typedef __u32 can_err_mask_t;
/* CAN payload length and DLC definitions according to ISO 11898-1 */ /* CAN payload length and DLC definitions according to ISO 11898-1 */
#define CAN_MAX_DLC 8 #define CAN_MAX_DLC 8
#define CAN_MAX_RAW_DLC 15
#define CAN_MAX_DLEN 8 #define CAN_MAX_DLEN 8
/* CAN FD payload length and DLC definitions according to ISO 11898-7 */ /* CAN FD payload length and DLC definitions according to ISO 11898-7 */
@ -91,30 +92,39 @@ typedef __u32 can_err_mask_t;
#define CANFD_MAX_DLEN 64 #define CANFD_MAX_DLEN 64
/** /**
* struct can_frame - basic CAN frame structure * struct can_frame - Classical CAN frame structure (aka CAN 2.0B)
* @can_id: CAN ID of the frame and CAN_*_FLAG flags, see canid_t definition * @can_id: CAN ID of the frame and CAN_*_FLAG flags, see canid_t definition
* @can_dlc: frame payload length in byte (0 .. 8) aka data length code * @len: CAN frame payload length in byte (0 .. 8)
* N.B. the DLC field from ISO 11898-1 Chapter 8.4.2.3 has a 1:1 * @can_dlc: deprecated name for CAN frame payload length in byte (0 .. 8)
* mapping of the 'data length code' to the real payload length * @__pad: padding
* @__pad: padding * @__res0: reserved / padding
* @__res0: reserved / padding * @len8_dlc: optional DLC value (9 .. 15) at 8 byte payload length
* @__res1: reserved / padding * len8_dlc contains values from 9 .. 15 when the payload length is
* @data: CAN frame payload (up to 8 byte) * 8 bytes but the DLC value (see ISO 11898-1) is greater then 8.
* CAN_CTRLMODE_CC_LEN8_DLC flag has to be enabled in CAN driver.
* @data: CAN frame payload (up to 8 byte)
*/ */
struct can_frame { struct can_frame {
canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */
__u8 can_dlc; /* frame payload length in byte (0 .. CAN_MAX_DLEN) */ union {
__u8 __pad; /* padding */ /* CAN frame payload length in byte (0 .. CAN_MAX_DLEN)
__u8 __res0; /* reserved / padding */ * was previously named can_dlc so we need to carry that
__u8 __res1; /* reserved / padding */ * name for legacy support
__u8 data[CAN_MAX_DLEN] __attribute__((aligned(8))); */
__u8 len;
__u8 can_dlc; /* deprecated */
} __attribute__((packed)); /* disable padding added in some ABIs */
__u8 __pad; /* padding */
__u8 __res0; /* reserved / padding */
__u8 len8_dlc; /* optional DLC for 8 byte payload length (9 .. 15) */
__u8 data[CAN_MAX_DLEN] __attribute__((aligned(8)));
}; };
/* /*
* defined bits for canfd_frame.flags * defined bits for canfd_frame.flags
* *
* The use of struct canfd_frame implies the Extended Data Length (EDL) bit to * The use of struct canfd_frame implies the FD Frame (FDF) bit to
* be set in the CAN frame bitstream on the wire. The EDL bit switch turns * be set in the CAN frame bitstream on the wire. The FDF bit switch turns
* the CAN controllers bitstream processor into the CAN FD mode which creates * the CAN controllers bitstream processor into the CAN FD mode which creates
* two new options within the CAN FD frame specification: * two new options within the CAN FD frame specification:
* *
@ -125,9 +135,18 @@ struct can_frame {
* controller only the CANFD_BRS bit is relevant for real CAN controllers when * controller only the CANFD_BRS bit is relevant for real CAN controllers when
* building a CAN FD frame for transmission. Setting the CANFD_ESI bit can make * building a CAN FD frame for transmission. Setting the CANFD_ESI bit can make
* sense for virtual CAN interfaces to test applications with echoed frames. * sense for virtual CAN interfaces to test applications with echoed frames.
*
* The struct can_frame and struct canfd_frame intentionally share the same
* layout to be able to write CAN frame content into a CAN FD frame structure.
* When this is done the former differentiation via CAN_MTU / CANFD_MTU gets
* lost. CANFD_FDF allows programmers to mark CAN FD frames in the case of
* using struct canfd_frame for mixed CAN / CAN FD content (dual use).
* N.B. the Kernel APIs do NOT provide mixed CAN / CAN FD content inside of
* struct canfd_frame therefore the CANFD_FDF flag is disregarded by Linux.
*/ */
#define CANFD_BRS 0x01 /* bit rate switch (second bitrate for payload data) */ #define CANFD_BRS 0x01 /* bit rate switch (second bitrate for payload data) */
#define CANFD_ESI 0x02 /* error state indicator of the transmitting node */ #define CANFD_ESI 0x02 /* error state indicator of the transmitting node */
#define CANFD_FDF 0x04 /* mark CAN FD for dual use of struct canfd_frame */
/** /**
* struct canfd_frame - CAN flexible data rate frame structure * struct canfd_frame - CAN flexible data rate frame structure

View File

@ -100,6 +100,9 @@ struct can_ctrlmode {
#define CAN_CTRLMODE_FD 0x20 /* CAN FD mode */ #define CAN_CTRLMODE_FD 0x20 /* CAN FD mode */
#define CAN_CTRLMODE_PRESUME_ACK 0x40 /* Ignore missing CAN ACKs */ #define CAN_CTRLMODE_PRESUME_ACK 0x40 /* Ignore missing CAN ACKs */
#define CAN_CTRLMODE_FD_NON_ISO 0x80 /* CAN FD in non-ISO mode */ #define CAN_CTRLMODE_FD_NON_ISO 0x80 /* CAN FD in non-ISO mode */
#define CAN_CTRLMODE_CC_LEN8_DLC 0x100 /* Classic CAN DLC option */
#define CAN_CTRLMODE_TDC_AUTO 0x200 /* CAN transiver automatically calculates TDCV */
#define CAN_CTRLMODE_TDC_MANUAL 0x400 /* TDCV is manually set up by user */
/* /*
* CAN device statistics * CAN device statistics
@ -133,10 +136,35 @@ enum {
IFLA_CAN_BITRATE_CONST, IFLA_CAN_BITRATE_CONST,
IFLA_CAN_DATA_BITRATE_CONST, IFLA_CAN_DATA_BITRATE_CONST,
IFLA_CAN_BITRATE_MAX, IFLA_CAN_BITRATE_MAX,
__IFLA_CAN_MAX IFLA_CAN_TDC,
/* add new constants above here */
__IFLA_CAN_MAX,
IFLA_CAN_MAX = __IFLA_CAN_MAX - 1
}; };
#define IFLA_CAN_MAX (__IFLA_CAN_MAX - 1) /*
* CAN FD Transmitter Delay Compensation (TDC)
*
* Please refer to struct can_tdc_const and can_tdc in
* include/linux/can/bittiming.h for further details.
*/
enum {
IFLA_CAN_TDC_UNSPEC,
IFLA_CAN_TDC_TDCV_MIN, /* u32 */
IFLA_CAN_TDC_TDCV_MAX, /* u32 */
IFLA_CAN_TDC_TDCO_MIN, /* u32 */
IFLA_CAN_TDC_TDCO_MAX, /* u32 */
IFLA_CAN_TDC_TDCF_MIN, /* u32 */
IFLA_CAN_TDC_TDCF_MAX, /* u32 */
IFLA_CAN_TDC_TDCV, /* u32 */
IFLA_CAN_TDC_TDCO, /* u32 */
IFLA_CAN_TDC_TDCF, /* u32 */
/* add new constants above here */
__IFLA_CAN_TDC,
IFLA_CAN_TDC_MAX = __IFLA_CAN_TDC - 1
};
/* u16 termination range: 1..65535 Ohms */ /* u16 termination range: 1..65535 Ohms */
#define CAN_TERMINATION_DISABLED 0 #define CAN_TERMINATION_DISABLED 0

View File

@ -28,4 +28,9 @@
#define _BITUL(x) (_UL(1) << (x)) #define _BITUL(x) (_UL(1) << (x))
#define _BITULL(x) (_ULL(1) << (x)) #define _BITULL(x) (_ULL(1) << (x))
#define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
#endif /* _LINUX_CONST_H */ #endif /* _LINUX_CONST_H */

769
include/uapi/linux/dcbnl.h Normal file
View File

@ -0,0 +1,769 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Copyright (c) 2008-2011, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
* Author: Lucy Liu <lucy.liu@intel.com>
*/
#ifndef __LINUX_DCBNL_H__
#define __LINUX_DCBNL_H__
#include <linux/types.h>
/* IEEE 802.1Qaz std supported values */
#define IEEE_8021QAZ_MAX_TCS 8
#define IEEE_8021QAZ_TSA_STRICT 0
#define IEEE_8021QAZ_TSA_CB_SHAPER 1
#define IEEE_8021QAZ_TSA_ETS 2
#define IEEE_8021QAZ_TSA_VENDOR 255
/* This structure contains the IEEE 802.1Qaz ETS managed object
*
* @willing: willing bit in ETS configuration TLV
* @ets_cap: indicates supported capacity of ets feature
* @cbs: credit based shaper ets algorithm supported
* @tc_tx_bw: tc tx bandwidth indexed by traffic class
* @tc_rx_bw: tc rx bandwidth indexed by traffic class
* @tc_tsa: TSA Assignment table, indexed by traffic class
* @prio_tc: priority assignment table mapping 8021Qp to traffic class
* @tc_reco_bw: recommended tc bandwidth indexed by traffic class for TLV
* @tc_reco_tsa: recommended tc bandwidth indexed by traffic class for TLV
* @reco_prio_tc: recommended tc tx bandwidth indexed by traffic class for TLV
*
* Recommended values are used to set fields in the ETS recommendation TLV
* with hardware offloaded LLDP.
*
* ----
* TSA Assignment 8 bit identifiers
* 0 strict priority
* 1 credit-based shaper
* 2 enhanced transmission selection
* 3-254 reserved
* 255 vendor specific
*/
struct ieee_ets {
__u8 willing;
__u8 ets_cap;
__u8 cbs;
__u8 tc_tx_bw[IEEE_8021QAZ_MAX_TCS];
__u8 tc_rx_bw[IEEE_8021QAZ_MAX_TCS];
__u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
__u8 prio_tc[IEEE_8021QAZ_MAX_TCS];
__u8 tc_reco_bw[IEEE_8021QAZ_MAX_TCS];
__u8 tc_reco_tsa[IEEE_8021QAZ_MAX_TCS];
__u8 reco_prio_tc[IEEE_8021QAZ_MAX_TCS];
};
/* This structure contains rate limit extension to the IEEE 802.1Qaz ETS
* managed object.
* Values are 64 bits long and specified in Kbps to enable usage over both
* slow and very fast networks.
*
* @tc_maxrate: maximal tc tx bandwidth indexed by traffic class
*/
struct ieee_maxrate {
__u64 tc_maxrate[IEEE_8021QAZ_MAX_TCS];
};
enum dcbnl_cndd_states {
DCB_CNDD_RESET = 0,
DCB_CNDD_EDGE,
DCB_CNDD_INTERIOR,
DCB_CNDD_INTERIOR_READY,
};
/* This structure contains the IEEE 802.1Qau QCN managed object.
*
*@rpg_enable: enable QCN RP
*@rppp_max_rps: maximum number of RPs allowed for this CNPV on this port
*@rpg_time_reset: time between rate increases if no CNMs received.
* given in u-seconds
*@rpg_byte_reset: transmitted data between rate increases if no CNMs received.
* given in Bytes
*@rpg_threshold: The number of times rpByteStage or rpTimeStage can count
* before RP rate control state machine advances states
*@rpg_max_rate: the maxinun rate, in Mbits per second,
* at which an RP can transmit
*@rpg_ai_rate: The rate, in Mbits per second,
* used to increase rpTargetRate in the RPR_ACTIVE_INCREASE
*@rpg_hai_rate: The rate, in Mbits per second,
* used to increase rpTargetRate in the RPR_HYPER_INCREASE state
*@rpg_gd: Upon CNM receive, flow rate is limited to (Fb/Gd)*CurrentRate.
* rpgGd is given as log2(Gd), where Gd may only be powers of 2
*@rpg_min_dec_fac: The minimum factor by which the current transmit rate
* can be changed by reception of a CNM.
* value is given as percentage (1-100)
*@rpg_min_rate: The minimum value, in bits per second, for rate to limit
*@cndd_state_machine: The state of the congestion notification domain
* defense state machine, as defined by IEEE 802.3Qau
* section 32.1.1. In the interior ready state,
* the QCN capable hardware may add CN-TAG TLV to the
* outgoing traffic, to specifically identify outgoing
* flows.
*/
struct ieee_qcn {
__u8 rpg_enable[IEEE_8021QAZ_MAX_TCS];
__u32 rppp_max_rps[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_time_reset[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_byte_reset[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_threshold[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_max_rate[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_ai_rate[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_hai_rate[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_gd[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_min_dec_fac[IEEE_8021QAZ_MAX_TCS];
__u32 rpg_min_rate[IEEE_8021QAZ_MAX_TCS];
__u32 cndd_state_machine[IEEE_8021QAZ_MAX_TCS];
};
/* This structure contains the IEEE 802.1Qau QCN statistics.
*
*@rppp_rp_centiseconds: the number of RP-centiseconds accumulated
* by RPs at this priority level on this Port
*@rppp_created_rps: number of active RPs(flows) that react to CNMs
*/
struct ieee_qcn_stats {
__u64 rppp_rp_centiseconds[IEEE_8021QAZ_MAX_TCS];
__u32 rppp_created_rps[IEEE_8021QAZ_MAX_TCS];
};
/* This structure contains the IEEE 802.1Qaz PFC managed object
*
* @pfc_cap: Indicates the number of traffic classes on the local device
* that may simultaneously have PFC enabled.
* @pfc_en: bitmap indicating pfc enabled traffic classes
* @mbc: enable macsec bypass capability
* @delay: the allowance made for a round-trip propagation delay of the
* link in bits.
* @requests: count of the sent pfc frames
* @indications: count of the received pfc frames
*/
struct ieee_pfc {
__u8 pfc_cap;
__u8 pfc_en;
__u8 mbc;
__u16 delay;
__u64 requests[IEEE_8021QAZ_MAX_TCS];
__u64 indications[IEEE_8021QAZ_MAX_TCS];
};
#define IEEE_8021Q_MAX_PRIORITIES 8
#define DCBX_MAX_BUFFERS 8
struct dcbnl_buffer {
/* priority to buffer mapping */
__u8 prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
/* buffer size in Bytes */
__u32 buffer_size[DCBX_MAX_BUFFERS];
__u32 total_size;
};
/* CEE DCBX std supported values */
#define CEE_DCBX_MAX_PGS 8
#define CEE_DCBX_MAX_PRIO 8
/**
* struct cee_pg - CEE Priority-Group managed object
*
* @willing: willing bit in the PG tlv
* @error: error bit in the PG tlv
* @pg_en: enable bit of the PG feature
* @tcs_supported: number of traffic classes supported
* @pg_bw: bandwidth percentage for each priority group
* @prio_pg: priority to PG mapping indexed by priority
*/
struct cee_pg {
__u8 willing;
__u8 error;
__u8 pg_en;
__u8 tcs_supported;
__u8 pg_bw[CEE_DCBX_MAX_PGS];
__u8 prio_pg[CEE_DCBX_MAX_PGS];
};
/**
* struct cee_pfc - CEE PFC managed object
*
* @willing: willing bit in the PFC tlv
* @error: error bit in the PFC tlv
* @pfc_en: bitmap indicating pfc enabled traffic classes
* @tcs_supported: number of traffic classes supported
*/
struct cee_pfc {
__u8 willing;
__u8 error;
__u8 pfc_en;
__u8 tcs_supported;
};
/* IEEE 802.1Qaz std supported values */
#define IEEE_8021QAZ_APP_SEL_ETHERTYPE 1
#define IEEE_8021QAZ_APP_SEL_STREAM 2
#define IEEE_8021QAZ_APP_SEL_DGRAM 3
#define IEEE_8021QAZ_APP_SEL_ANY 4
#define IEEE_8021QAZ_APP_SEL_DSCP 5
/* This structure contains the IEEE 802.1Qaz APP managed object. This
* object is also used for the CEE std as well.
*
* @selector: protocol identifier type
* @protocol: protocol of type indicated
* @priority: 3-bit unsigned integer indicating priority for IEEE
* 8-bit 802.1p user priority bitmap for CEE
*
* ----
* Selector field values for IEEE 802.1Qaz
* 0 Reserved
* 1 Ethertype
* 2 Well known port number over TCP or SCTP
* 3 Well known port number over UDP or DCCP
* 4 Well known port number over TCP, SCTP, UDP, or DCCP
* 5 Differentiated Services Code Point (DSCP) value
* 6-7 Reserved
*
* Selector field values for CEE
* 0 Ethertype
* 1 Well known port number over TCP or UDP
* 2-3 Reserved
*/
struct dcb_app {
__u8 selector;
__u8 priority;
__u16 protocol;
};
/**
* struct dcb_peer_app_info - APP feature information sent by the peer
*
* @willing: willing bit in the peer APP tlv
* @error: error bit in the peer APP tlv
*
* In addition to this information the full peer APP tlv also contains
* a table of 'app_count' APP objects defined above.
*/
struct dcb_peer_app_info {
__u8 willing;
__u8 error;
};
struct dcbmsg {
__u8 dcb_family;
__u8 cmd;
__u16 dcb_pad;
};
/**
* enum dcbnl_commands - supported DCB commands
*
* @DCB_CMD_UNDEFINED: unspecified command to catch errors
* @DCB_CMD_GSTATE: request the state of DCB in the device
* @DCB_CMD_SSTATE: set the state of DCB in the device
* @DCB_CMD_PGTX_GCFG: request the priority group configuration for Tx
* @DCB_CMD_PGTX_SCFG: set the priority group configuration for Tx
* @DCB_CMD_PGRX_GCFG: request the priority group configuration for Rx
* @DCB_CMD_PGRX_SCFG: set the priority group configuration for Rx
* @DCB_CMD_PFC_GCFG: request the priority flow control configuration
* @DCB_CMD_PFC_SCFG: set the priority flow control configuration
* @DCB_CMD_SET_ALL: apply all changes to the underlying device
* @DCB_CMD_GPERM_HWADDR: get the permanent MAC address of the underlying
* device. Only useful when using bonding.
* @DCB_CMD_GCAP: request the DCB capabilities of the device
* @DCB_CMD_GNUMTCS: get the number of traffic classes currently supported
* @DCB_CMD_SNUMTCS: set the number of traffic classes
* @DCB_CMD_GBCN: set backward congestion notification configuration
* @DCB_CMD_SBCN: get backward congestion notification configuration.
* @DCB_CMD_GAPP: get application protocol configuration
* @DCB_CMD_SAPP: set application protocol configuration
* @DCB_CMD_IEEE_SET: set IEEE 802.1Qaz configuration
* @DCB_CMD_IEEE_GET: get IEEE 802.1Qaz configuration
* @DCB_CMD_GDCBX: get DCBX engine configuration
* @DCB_CMD_SDCBX: set DCBX engine configuration
* @DCB_CMD_GFEATCFG: get DCBX features flags
* @DCB_CMD_SFEATCFG: set DCBX features negotiation flags
* @DCB_CMD_CEE_GET: get CEE aggregated configuration
* @DCB_CMD_IEEE_DEL: delete IEEE 802.1Qaz configuration
*/
enum dcbnl_commands {
DCB_CMD_UNDEFINED,
DCB_CMD_GSTATE,
DCB_CMD_SSTATE,
DCB_CMD_PGTX_GCFG,
DCB_CMD_PGTX_SCFG,
DCB_CMD_PGRX_GCFG,
DCB_CMD_PGRX_SCFG,
DCB_CMD_PFC_GCFG,
DCB_CMD_PFC_SCFG,
DCB_CMD_SET_ALL,
DCB_CMD_GPERM_HWADDR,
DCB_CMD_GCAP,
DCB_CMD_GNUMTCS,
DCB_CMD_SNUMTCS,
DCB_CMD_PFC_GSTATE,
DCB_CMD_PFC_SSTATE,
DCB_CMD_BCN_GCFG,
DCB_CMD_BCN_SCFG,
DCB_CMD_GAPP,
DCB_CMD_SAPP,
DCB_CMD_IEEE_SET,
DCB_CMD_IEEE_GET,
DCB_CMD_GDCBX,
DCB_CMD_SDCBX,
DCB_CMD_GFEATCFG,
DCB_CMD_SFEATCFG,
DCB_CMD_CEE_GET,
DCB_CMD_IEEE_DEL,
__DCB_CMD_ENUM_MAX,
DCB_CMD_MAX = __DCB_CMD_ENUM_MAX - 1,
};
/**
* enum dcbnl_attrs - DCB top-level netlink attributes
*
* @DCB_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_ATTR_IFNAME: interface name of the underlying device (NLA_STRING)
* @DCB_ATTR_STATE: enable state of DCB in the device (NLA_U8)
* @DCB_ATTR_PFC_STATE: enable state of PFC in the device (NLA_U8)
* @DCB_ATTR_PFC_CFG: priority flow control configuration (NLA_NESTED)
* @DCB_ATTR_NUM_TC: number of traffic classes supported in the device (NLA_U8)
* @DCB_ATTR_PG_CFG: priority group configuration (NLA_NESTED)
* @DCB_ATTR_SET_ALL: bool to commit changes to hardware or not (NLA_U8)
* @DCB_ATTR_PERM_HWADDR: MAC address of the physical device (NLA_NESTED)
* @DCB_ATTR_CAP: DCB capabilities of the device (NLA_NESTED)
* @DCB_ATTR_NUMTCS: number of traffic classes supported (NLA_NESTED)
* @DCB_ATTR_BCN: backward congestion notification configuration (NLA_NESTED)
* @DCB_ATTR_IEEE: IEEE 802.1Qaz supported attributes (NLA_NESTED)
* @DCB_ATTR_DCBX: DCBX engine configuration in the device (NLA_U8)
* @DCB_ATTR_FEATCFG: DCBX features flags (NLA_NESTED)
* @DCB_ATTR_CEE: CEE std supported attributes (NLA_NESTED)
*/
enum dcbnl_attrs {
DCB_ATTR_UNDEFINED,
DCB_ATTR_IFNAME,
DCB_ATTR_STATE,
DCB_ATTR_PFC_STATE,
DCB_ATTR_PFC_CFG,
DCB_ATTR_NUM_TC,
DCB_ATTR_PG_CFG,
DCB_ATTR_SET_ALL,
DCB_ATTR_PERM_HWADDR,
DCB_ATTR_CAP,
DCB_ATTR_NUMTCS,
DCB_ATTR_BCN,
DCB_ATTR_APP,
/* IEEE std attributes */
DCB_ATTR_IEEE,
DCB_ATTR_DCBX,
DCB_ATTR_FEATCFG,
/* CEE nested attributes */
DCB_ATTR_CEE,
__DCB_ATTR_ENUM_MAX,
DCB_ATTR_MAX = __DCB_ATTR_ENUM_MAX - 1,
};
/**
* enum ieee_attrs - IEEE 802.1Qaz get/set attributes
*
* @DCB_ATTR_IEEE_UNSPEC: unspecified
* @DCB_ATTR_IEEE_ETS: negotiated ETS configuration
* @DCB_ATTR_IEEE_PFC: negotiated PFC configuration
* @DCB_ATTR_IEEE_APP_TABLE: negotiated APP configuration
* @DCB_ATTR_IEEE_PEER_ETS: peer ETS configuration - get only
* @DCB_ATTR_IEEE_PEER_PFC: peer PFC configuration - get only
* @DCB_ATTR_IEEE_PEER_APP: peer APP tlv - get only
*/
enum ieee_attrs {
DCB_ATTR_IEEE_UNSPEC,
DCB_ATTR_IEEE_ETS,
DCB_ATTR_IEEE_PFC,
DCB_ATTR_IEEE_APP_TABLE,
DCB_ATTR_IEEE_PEER_ETS,
DCB_ATTR_IEEE_PEER_PFC,
DCB_ATTR_IEEE_PEER_APP,
DCB_ATTR_IEEE_MAXRATE,
DCB_ATTR_IEEE_QCN,
DCB_ATTR_IEEE_QCN_STATS,
DCB_ATTR_DCB_BUFFER,
__DCB_ATTR_IEEE_MAX
};
#define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1)
enum ieee_attrs_app {
DCB_ATTR_IEEE_APP_UNSPEC,
DCB_ATTR_IEEE_APP,
__DCB_ATTR_IEEE_APP_MAX
};
#define DCB_ATTR_IEEE_APP_MAX (__DCB_ATTR_IEEE_APP_MAX - 1)
/**
* enum cee_attrs - CEE DCBX get attributes.
*
* @DCB_ATTR_CEE_UNSPEC: unspecified
* @DCB_ATTR_CEE_PEER_PG: peer PG configuration - get only
* @DCB_ATTR_CEE_PEER_PFC: peer PFC configuration - get only
* @DCB_ATTR_CEE_PEER_APP_TABLE: peer APP tlv - get only
* @DCB_ATTR_CEE_TX_PG: TX PG configuration (DCB_CMD_PGTX_GCFG)
* @DCB_ATTR_CEE_RX_PG: RX PG configuration (DCB_CMD_PGRX_GCFG)
* @DCB_ATTR_CEE_PFC: PFC configuration (DCB_CMD_PFC_GCFG)
* @DCB_ATTR_CEE_APP_TABLE: APP configuration (multi DCB_CMD_GAPP)
* @DCB_ATTR_CEE_FEAT: DCBX features flags (DCB_CMD_GFEATCFG)
*
* An aggregated collection of the cee std negotiated parameters.
*/
enum cee_attrs {
DCB_ATTR_CEE_UNSPEC,
DCB_ATTR_CEE_PEER_PG,
DCB_ATTR_CEE_PEER_PFC,
DCB_ATTR_CEE_PEER_APP_TABLE,
DCB_ATTR_CEE_TX_PG,
DCB_ATTR_CEE_RX_PG,
DCB_ATTR_CEE_PFC,
DCB_ATTR_CEE_APP_TABLE,
DCB_ATTR_CEE_FEAT,
__DCB_ATTR_CEE_MAX
};
#define DCB_ATTR_CEE_MAX (__DCB_ATTR_CEE_MAX - 1)
enum peer_app_attr {
DCB_ATTR_CEE_PEER_APP_UNSPEC,
DCB_ATTR_CEE_PEER_APP_INFO,
DCB_ATTR_CEE_PEER_APP,
__DCB_ATTR_CEE_PEER_APP_MAX
};
#define DCB_ATTR_CEE_PEER_APP_MAX (__DCB_ATTR_CEE_PEER_APP_MAX - 1)
enum cee_attrs_app {
DCB_ATTR_CEE_APP_UNSPEC,
DCB_ATTR_CEE_APP,
__DCB_ATTR_CEE_APP_MAX
};
#define DCB_ATTR_CEE_APP_MAX (__DCB_ATTR_CEE_APP_MAX - 1)
/**
* enum dcbnl_pfc_attrs - DCB Priority Flow Control user priority nested attrs
*
* @DCB_PFC_UP_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_PFC_UP_ATTR_0: Priority Flow Control value for User Priority 0 (NLA_U8)
* @DCB_PFC_UP_ATTR_1: Priority Flow Control value for User Priority 1 (NLA_U8)
* @DCB_PFC_UP_ATTR_2: Priority Flow Control value for User Priority 2 (NLA_U8)
* @DCB_PFC_UP_ATTR_3: Priority Flow Control value for User Priority 3 (NLA_U8)
* @DCB_PFC_UP_ATTR_4: Priority Flow Control value for User Priority 4 (NLA_U8)
* @DCB_PFC_UP_ATTR_5: Priority Flow Control value for User Priority 5 (NLA_U8)
* @DCB_PFC_UP_ATTR_6: Priority Flow Control value for User Priority 6 (NLA_U8)
* @DCB_PFC_UP_ATTR_7: Priority Flow Control value for User Priority 7 (NLA_U8)
* @DCB_PFC_UP_ATTR_MAX: highest attribute number currently defined
* @DCB_PFC_UP_ATTR_ALL: apply to all priority flow control attrs (NLA_FLAG)
*
*/
enum dcbnl_pfc_up_attrs {
DCB_PFC_UP_ATTR_UNDEFINED,
DCB_PFC_UP_ATTR_0,
DCB_PFC_UP_ATTR_1,
DCB_PFC_UP_ATTR_2,
DCB_PFC_UP_ATTR_3,
DCB_PFC_UP_ATTR_4,
DCB_PFC_UP_ATTR_5,
DCB_PFC_UP_ATTR_6,
DCB_PFC_UP_ATTR_7,
DCB_PFC_UP_ATTR_ALL,
__DCB_PFC_UP_ATTR_ENUM_MAX,
DCB_PFC_UP_ATTR_MAX = __DCB_PFC_UP_ATTR_ENUM_MAX - 1,
};
/**
* enum dcbnl_pg_attrs - DCB Priority Group attributes
*
* @DCB_PG_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_PG_ATTR_TC_0: Priority Group Traffic Class 0 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_1: Priority Group Traffic Class 1 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_2: Priority Group Traffic Class 2 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_3: Priority Group Traffic Class 3 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_4: Priority Group Traffic Class 4 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_5: Priority Group Traffic Class 5 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_6: Priority Group Traffic Class 6 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_7: Priority Group Traffic Class 7 configuration (NLA_NESTED)
* @DCB_PG_ATTR_TC_MAX: highest attribute number currently defined
* @DCB_PG_ATTR_TC_ALL: apply to all traffic classes (NLA_NESTED)
* @DCB_PG_ATTR_BW_ID_0: Percent of link bandwidth for Priority Group 0 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_1: Percent of link bandwidth for Priority Group 1 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_2: Percent of link bandwidth for Priority Group 2 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_3: Percent of link bandwidth for Priority Group 3 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_4: Percent of link bandwidth for Priority Group 4 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_5: Percent of link bandwidth for Priority Group 5 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_6: Percent of link bandwidth for Priority Group 6 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_7: Percent of link bandwidth for Priority Group 7 (NLA_U8)
* @DCB_PG_ATTR_BW_ID_MAX: highest attribute number currently defined
* @DCB_PG_ATTR_BW_ID_ALL: apply to all priority groups (NLA_FLAG)
*
*/
enum dcbnl_pg_attrs {
DCB_PG_ATTR_UNDEFINED,
DCB_PG_ATTR_TC_0,
DCB_PG_ATTR_TC_1,
DCB_PG_ATTR_TC_2,
DCB_PG_ATTR_TC_3,
DCB_PG_ATTR_TC_4,
DCB_PG_ATTR_TC_5,
DCB_PG_ATTR_TC_6,
DCB_PG_ATTR_TC_7,
DCB_PG_ATTR_TC_MAX,
DCB_PG_ATTR_TC_ALL,
DCB_PG_ATTR_BW_ID_0,
DCB_PG_ATTR_BW_ID_1,
DCB_PG_ATTR_BW_ID_2,
DCB_PG_ATTR_BW_ID_3,
DCB_PG_ATTR_BW_ID_4,
DCB_PG_ATTR_BW_ID_5,
DCB_PG_ATTR_BW_ID_6,
DCB_PG_ATTR_BW_ID_7,
DCB_PG_ATTR_BW_ID_MAX,
DCB_PG_ATTR_BW_ID_ALL,
__DCB_PG_ATTR_ENUM_MAX,
DCB_PG_ATTR_MAX = __DCB_PG_ATTR_ENUM_MAX - 1,
};
/**
* enum dcbnl_tc_attrs - DCB Traffic Class attributes
*
* @DCB_TC_ATTR_PARAM_UNDEFINED: unspecified attribute to catch errors
* @DCB_TC_ATTR_PARAM_PGID: (NLA_U8) Priority group the traffic class belongs to
* Valid values are: 0-7
* @DCB_TC_ATTR_PARAM_UP_MAPPING: (NLA_U8) Traffic class to user priority map
* Some devices may not support changing the
* user priority map of a TC.
* @DCB_TC_ATTR_PARAM_STRICT_PRIO: (NLA_U8) Strict priority setting
* 0 - none
* 1 - group strict
* 2 - link strict
* @DCB_TC_ATTR_PARAM_BW_PCT: optional - (NLA_U8) If supported by the device and
* not configured to use link strict priority,
* this is the percentage of bandwidth of the
* priority group this traffic class belongs to
* @DCB_TC_ATTR_PARAM_ALL: (NLA_FLAG) all traffic class parameters
*
*/
enum dcbnl_tc_attrs {
DCB_TC_ATTR_PARAM_UNDEFINED,
DCB_TC_ATTR_PARAM_PGID,
DCB_TC_ATTR_PARAM_UP_MAPPING,
DCB_TC_ATTR_PARAM_STRICT_PRIO,
DCB_TC_ATTR_PARAM_BW_PCT,
DCB_TC_ATTR_PARAM_ALL,
__DCB_TC_ATTR_PARAM_ENUM_MAX,
DCB_TC_ATTR_PARAM_MAX = __DCB_TC_ATTR_PARAM_ENUM_MAX - 1,
};
/**
* enum dcbnl_cap_attrs - DCB Capability attributes
*
* @DCB_CAP_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_CAP_ATTR_ALL: (NLA_FLAG) all capability parameters
* @DCB_CAP_ATTR_PG: (NLA_U8) device supports Priority Groups
* @DCB_CAP_ATTR_PFC: (NLA_U8) device supports Priority Flow Control
* @DCB_CAP_ATTR_UP2TC: (NLA_U8) device supports user priority to
* traffic class mapping
* @DCB_CAP_ATTR_PG_TCS: (NLA_U8) bitmap where each bit represents a
* number of traffic classes the device
* can be configured to use for Priority Groups
* @DCB_CAP_ATTR_PFC_TCS: (NLA_U8) bitmap where each bit represents a
* number of traffic classes the device can be
* configured to use for Priority Flow Control
* @DCB_CAP_ATTR_GSP: (NLA_U8) device supports group strict priority
* @DCB_CAP_ATTR_BCN: (NLA_U8) device supports Backwards Congestion
* Notification
* @DCB_CAP_ATTR_DCBX: (NLA_U8) device supports DCBX engine
*
*/
enum dcbnl_cap_attrs {
DCB_CAP_ATTR_UNDEFINED,
DCB_CAP_ATTR_ALL,
DCB_CAP_ATTR_PG,
DCB_CAP_ATTR_PFC,
DCB_CAP_ATTR_UP2TC,
DCB_CAP_ATTR_PG_TCS,
DCB_CAP_ATTR_PFC_TCS,
DCB_CAP_ATTR_GSP,
DCB_CAP_ATTR_BCN,
DCB_CAP_ATTR_DCBX,
__DCB_CAP_ATTR_ENUM_MAX,
DCB_CAP_ATTR_MAX = __DCB_CAP_ATTR_ENUM_MAX - 1,
};
/**
* DCBX capability flags
*
* @DCB_CAP_DCBX_HOST: DCBX negotiation is performed by the host LLDP agent.
* 'set' routines are used to configure the device with
* the negotiated parameters
*
* @DCB_CAP_DCBX_LLD_MANAGED: DCBX negotiation is not performed in the host but
* by another entity
* 'get' routines are used to retrieve the
* negotiated parameters
* 'set' routines can be used to set the initial
* negotiation configuration
*
* @DCB_CAP_DCBX_VER_CEE: for a non-host DCBX engine, indicates the engine
* supports the CEE protocol flavor
*
* @DCB_CAP_DCBX_VER_IEEE: for a non-host DCBX engine, indicates the engine
* supports the IEEE protocol flavor
*
* @DCB_CAP_DCBX_STATIC: for a non-host DCBX engine, indicates the engine
* supports static configuration (i.e no actual
* negotiation is performed negotiated parameters equal
* the initial configuration)
*
*/
#define DCB_CAP_DCBX_HOST 0x01
#define DCB_CAP_DCBX_LLD_MANAGED 0x02
#define DCB_CAP_DCBX_VER_CEE 0x04
#define DCB_CAP_DCBX_VER_IEEE 0x08
#define DCB_CAP_DCBX_STATIC 0x10
/**
* enum dcbnl_numtcs_attrs - number of traffic classes
*
* @DCB_NUMTCS_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_NUMTCS_ATTR_ALL: (NLA_FLAG) all traffic class attributes
* @DCB_NUMTCS_ATTR_PG: (NLA_U8) number of traffic classes used for
* priority groups
* @DCB_NUMTCS_ATTR_PFC: (NLA_U8) number of traffic classes which can
* support priority flow control
*/
enum dcbnl_numtcs_attrs {
DCB_NUMTCS_ATTR_UNDEFINED,
DCB_NUMTCS_ATTR_ALL,
DCB_NUMTCS_ATTR_PG,
DCB_NUMTCS_ATTR_PFC,
__DCB_NUMTCS_ATTR_ENUM_MAX,
DCB_NUMTCS_ATTR_MAX = __DCB_NUMTCS_ATTR_ENUM_MAX - 1,
};
enum dcbnl_bcn_attrs{
DCB_BCN_ATTR_UNDEFINED = 0,
DCB_BCN_ATTR_RP_0,
DCB_BCN_ATTR_RP_1,
DCB_BCN_ATTR_RP_2,
DCB_BCN_ATTR_RP_3,
DCB_BCN_ATTR_RP_4,
DCB_BCN_ATTR_RP_5,
DCB_BCN_ATTR_RP_6,
DCB_BCN_ATTR_RP_7,
DCB_BCN_ATTR_RP_ALL,
DCB_BCN_ATTR_BCNA_0,
DCB_BCN_ATTR_BCNA_1,
DCB_BCN_ATTR_ALPHA,
DCB_BCN_ATTR_BETA,
DCB_BCN_ATTR_GD,
DCB_BCN_ATTR_GI,
DCB_BCN_ATTR_TMAX,
DCB_BCN_ATTR_TD,
DCB_BCN_ATTR_RMIN,
DCB_BCN_ATTR_W,
DCB_BCN_ATTR_RD,
DCB_BCN_ATTR_RU,
DCB_BCN_ATTR_WRTT,
DCB_BCN_ATTR_RI,
DCB_BCN_ATTR_C,
DCB_BCN_ATTR_ALL,
__DCB_BCN_ATTR_ENUM_MAX,
DCB_BCN_ATTR_MAX = __DCB_BCN_ATTR_ENUM_MAX - 1,
};
/**
* enum dcb_general_attr_values - general DCB attribute values
*
* @DCB_ATTR_UNDEFINED: value used to indicate an attribute is not supported
*
*/
enum dcb_general_attr_values {
DCB_ATTR_VALUE_UNDEFINED = 0xff
};
#define DCB_APP_IDTYPE_ETHTYPE 0x00
#define DCB_APP_IDTYPE_PORTNUM 0x01
enum dcbnl_app_attrs {
DCB_APP_ATTR_UNDEFINED,
DCB_APP_ATTR_IDTYPE,
DCB_APP_ATTR_ID,
DCB_APP_ATTR_PRIORITY,
__DCB_APP_ATTR_ENUM_MAX,
DCB_APP_ATTR_MAX = __DCB_APP_ATTR_ENUM_MAX - 1,
};
/**
* enum dcbnl_featcfg_attrs - features conifiguration flags
*
* @DCB_FEATCFG_ATTR_UNDEFINED: unspecified attribute to catch errors
* @DCB_FEATCFG_ATTR_ALL: (NLA_FLAG) all features configuration attributes
* @DCB_FEATCFG_ATTR_PG: (NLA_U8) configuration flags for priority groups
* @DCB_FEATCFG_ATTR_PFC: (NLA_U8) configuration flags for priority
* flow control
* @DCB_FEATCFG_ATTR_APP: (NLA_U8) configuration flags for application TLV
*
*/
#define DCB_FEATCFG_ERROR 0x01 /* error in feature resolution */
#define DCB_FEATCFG_ENABLE 0x02 /* enable feature */
#define DCB_FEATCFG_WILLING 0x04 /* feature is willing */
#define DCB_FEATCFG_ADVERTISE 0x08 /* advertise feature */
enum dcbnl_featcfg_attrs {
DCB_FEATCFG_ATTR_UNDEFINED,
DCB_FEATCFG_ATTR_ALL,
DCB_FEATCFG_ATTR_PG,
DCB_FEATCFG_ATTR_PFC,
DCB_FEATCFG_ATTR_APP,
__DCB_FEATCFG_ATTR_ENUM_MAX,
DCB_FEATCFG_ATTR_MAX = __DCB_FEATCFG_ATTR_ENUM_MAX - 1,
};
#endif /* __LINUX_DCBNL_H__ */

View File

@ -13,6 +13,8 @@
#ifndef _LINUX_DEVLINK_H_ #ifndef _LINUX_DEVLINK_H_
#define _LINUX_DEVLINK_H_ #define _LINUX_DEVLINK_H_
#include <linux/const.h>
#define DEVLINK_GENL_NAME "devlink" #define DEVLINK_GENL_NAME "devlink"
#define DEVLINK_GENL_VERSION 0x1 #define DEVLINK_GENL_VERSION 0x1
#define DEVLINK_GENL_MCGRP_CONFIG_NAME "config" #define DEVLINK_GENL_MCGRP_CONFIG_NAME "config"
@ -122,6 +124,13 @@ enum devlink_command {
DEVLINK_CMD_TRAP_POLICER_NEW, DEVLINK_CMD_TRAP_POLICER_NEW,
DEVLINK_CMD_TRAP_POLICER_DEL, DEVLINK_CMD_TRAP_POLICER_DEL,
DEVLINK_CMD_HEALTH_REPORTER_TEST,
DEVLINK_CMD_RATE_GET, /* can dump */
DEVLINK_CMD_RATE_SET,
DEVLINK_CMD_RATE_NEW,
DEVLINK_CMD_RATE_DEL,
/* add new commands above here */ /* add new commands above here */
__DEVLINK_CMD_MAX, __DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1 DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@ -193,6 +202,18 @@ enum devlink_port_flavour {
* port that faces the PCI VF. * port that faces the PCI VF.
*/ */
DEVLINK_PORT_FLAVOUR_VIRTUAL, /* Any virtual port facing the user. */ DEVLINK_PORT_FLAVOUR_VIRTUAL, /* Any virtual port facing the user. */
DEVLINK_PORT_FLAVOUR_UNUSED, /* Port which exists in the switch, but
* is not used in any way.
*/
DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port
* for the PCI SF. It is an internal
* port that faces the PCI SF.
*/
};
enum devlink_rate_type {
DEVLINK_RATE_TYPE_LEAF,
DEVLINK_RATE_TYPE_NODE,
}; };
enum devlink_param_cmode { enum devlink_param_cmode {
@ -228,15 +249,40 @@ enum {
DEVLINK_ATTR_STATS_MAX = __DEVLINK_ATTR_STATS_MAX - 1 DEVLINK_ATTR_STATS_MAX = __DEVLINK_ATTR_STATS_MAX - 1
}; };
/* Specify what sections of a flash component can be overwritten when
* performing an update. Overwriting of firmware binary sections is always
* implicitly assumed to be allowed.
*
* Each section must be documented in
* Documentation/networking/devlink/devlink-flash.rst
*
*/
enum {
DEVLINK_FLASH_OVERWRITE_SETTINGS_BIT,
DEVLINK_FLASH_OVERWRITE_IDENTIFIERS_BIT,
__DEVLINK_FLASH_OVERWRITE_MAX_BIT,
DEVLINK_FLASH_OVERWRITE_MAX_BIT = __DEVLINK_FLASH_OVERWRITE_MAX_BIT - 1
};
#define DEVLINK_FLASH_OVERWRITE_SETTINGS _BITUL(DEVLINK_FLASH_OVERWRITE_SETTINGS_BIT)
#define DEVLINK_FLASH_OVERWRITE_IDENTIFIERS _BITUL(DEVLINK_FLASH_OVERWRITE_IDENTIFIERS_BIT)
#define DEVLINK_SUPPORTED_FLASH_OVERWRITE_SECTIONS \
(_BITUL(__DEVLINK_FLASH_OVERWRITE_MAX_BIT) - 1)
/** /**
* enum devlink_trap_action - Packet trap action. * enum devlink_trap_action - Packet trap action.
* @DEVLINK_TRAP_ACTION_DROP: Packet is dropped by the device and a copy is not * @DEVLINK_TRAP_ACTION_DROP: Packet is dropped by the device and a copy is not
* sent to the CPU. * sent to the CPU.
* @DEVLINK_TRAP_ACTION_TRAP: The sole copy of the packet is sent to the CPU. * @DEVLINK_TRAP_ACTION_TRAP: The sole copy of the packet is sent to the CPU.
* @DEVLINK_TRAP_ACTION_MIRROR: Packet is forwarded by the device and a copy is
* sent to the CPU.
*/ */
enum devlink_trap_action { enum devlink_trap_action {
DEVLINK_TRAP_ACTION_DROP, DEVLINK_TRAP_ACTION_DROP,
DEVLINK_TRAP_ACTION_TRAP, DEVLINK_TRAP_ACTION_TRAP,
DEVLINK_TRAP_ACTION_MIRROR,
}; };
/** /**
@ -250,10 +296,16 @@ enum devlink_trap_action {
* control plane for resolution. Trapped packets * control plane for resolution. Trapped packets
* are processed by devlink and injected to * are processed by devlink and injected to
* the kernel's Rx path. * the kernel's Rx path.
* @DEVLINK_TRAP_TYPE_CONTROL: Packet was trapped because it is required for
* the correct functioning of the control plane.
* For example, an ARP request packet. Trapped
* packets are injected to the kernel's Rx path,
* but not reported to drop monitor.
*/ */
enum devlink_trap_type { enum devlink_trap_type {
DEVLINK_TRAP_TYPE_DROP, DEVLINK_TRAP_TYPE_DROP,
DEVLINK_TRAP_TYPE_EXCEPTION, DEVLINK_TRAP_TYPE_EXCEPTION,
DEVLINK_TRAP_TYPE_CONTROL,
}; };
enum { enum {
@ -263,6 +315,29 @@ enum {
DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE, DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE,
}; };
enum devlink_reload_action {
DEVLINK_RELOAD_ACTION_UNSPEC,
DEVLINK_RELOAD_ACTION_DRIVER_REINIT, /* Driver entities re-instantiation */
DEVLINK_RELOAD_ACTION_FW_ACTIVATE, /* FW activate */
/* Add new reload actions above */
__DEVLINK_RELOAD_ACTION_MAX,
DEVLINK_RELOAD_ACTION_MAX = __DEVLINK_RELOAD_ACTION_MAX - 1
};
enum devlink_reload_limit {
DEVLINK_RELOAD_LIMIT_UNSPEC, /* unspecified, no constraints */
DEVLINK_RELOAD_LIMIT_NO_RESET, /* No reset allowed, no down time allowed,
* no link flap and no configuration is lost.
*/
/* Add new reload limit above */
__DEVLINK_RELOAD_LIMIT_MAX,
DEVLINK_RELOAD_LIMIT_MAX = __DEVLINK_RELOAD_LIMIT_MAX - 1
};
#define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
enum devlink_attr { enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */ /* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC, DEVLINK_ATTR_UNSPEC,
@ -442,6 +517,42 @@ enum devlink_attr {
DEVLINK_ATTR_TRAP_POLICER_RATE, /* u64 */ DEVLINK_ATTR_TRAP_POLICER_RATE, /* u64 */
DEVLINK_ATTR_TRAP_POLICER_BURST, /* u64 */ DEVLINK_ATTR_TRAP_POLICER_BURST, /* u64 */
DEVLINK_ATTR_PORT_FUNCTION, /* nested */
DEVLINK_ATTR_INFO_BOARD_SERIAL_NUMBER, /* string */
DEVLINK_ATTR_PORT_LANES, /* u32 */
DEVLINK_ATTR_PORT_SPLITTABLE, /* u8 */
DEVLINK_ATTR_PORT_EXTERNAL, /* u8 */
DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, /* u32 */
DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT, /* u64 */
DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK, /* bitfield32 */
DEVLINK_ATTR_RELOAD_ACTION, /* u8 */
DEVLINK_ATTR_RELOAD_ACTIONS_PERFORMED, /* bitfield32 */
DEVLINK_ATTR_RELOAD_LIMITS, /* bitfield32 */
DEVLINK_ATTR_DEV_STATS, /* nested */
DEVLINK_ATTR_RELOAD_STATS, /* nested */
DEVLINK_ATTR_RELOAD_STATS_ENTRY, /* nested */
DEVLINK_ATTR_RELOAD_STATS_LIMIT, /* u8 */
DEVLINK_ATTR_RELOAD_STATS_VALUE, /* u32 */
DEVLINK_ATTR_REMOTE_RELOAD_STATS, /* nested */
DEVLINK_ATTR_RELOAD_ACTION_INFO, /* nested */
DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */
DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */
DEVLINK_ATTR_RATE_TYPE, /* u16 */
DEVLINK_ATTR_RATE_TX_SHARE, /* u64 */
DEVLINK_ATTR_RATE_TX_MAX, /* u64 */
DEVLINK_ATTR_RATE_NODE_NAME, /* string */
DEVLINK_ATTR_RATE_PARENT_NODE_NAME, /* string */
DEVLINK_ATTR_REGION_MAX_SNAPSHOTS, /* u32 */
/* add new attributes above here, update the policy in devlink.c */ /* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX, __DEVLINK_ATTR_MAX,
@ -488,4 +599,32 @@ enum devlink_resource_unit {
DEVLINK_RESOURCE_UNIT_ENTRY, DEVLINK_RESOURCE_UNIT_ENTRY,
}; };
enum devlink_port_function_attr {
DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, /* binary */
DEVLINK_PORT_FN_ATTR_STATE, /* u8 */
DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */
__DEVLINK_PORT_FUNCTION_ATTR_MAX,
DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
};
enum devlink_port_fn_state {
DEVLINK_PORT_FN_STATE_INACTIVE,
DEVLINK_PORT_FN_STATE_ACTIVE,
};
/**
* enum devlink_port_fn_opstate - indicates operational state of the function
* @DEVLINK_PORT_FN_OPSTATE_ATTACHED: Driver is attached to the function.
* For graceful tear down of the function, after inactivation of the
* function, user should wait for operational state to turn DETACHED.
* @DEVLINK_PORT_FN_OPSTATE_DETACHED: Driver is detached from the function.
* It is safe to delete the port.
*/
enum devlink_port_fn_opstate {
DEVLINK_PORT_FN_OPSTATE_DETACHED,
DEVLINK_PORT_FN_OPSTATE_ATTACHED,
};
#endif /* _LINUX_DEVLINK_H_ */ #endif /* _LINUX_DEVLINK_H_ */

View File

@ -48,6 +48,7 @@ enum {
CTRL_CMD_NEWMCAST_GRP, CTRL_CMD_NEWMCAST_GRP,
CTRL_CMD_DELMCAST_GRP, CTRL_CMD_DELMCAST_GRP,
CTRL_CMD_GETMCAST_GRP, /* unused */ CTRL_CMD_GETMCAST_GRP, /* unused */
CTRL_CMD_GETPOLICY,
__CTRL_CMD_MAX, __CTRL_CMD_MAX,
}; };
@ -62,6 +63,9 @@ enum {
CTRL_ATTR_MAXATTR, CTRL_ATTR_MAXATTR,
CTRL_ATTR_OPS, CTRL_ATTR_OPS,
CTRL_ATTR_MCAST_GROUPS, CTRL_ATTR_MCAST_GROUPS,
CTRL_ATTR_POLICY,
CTRL_ATTR_OP_POLICY,
CTRL_ATTR_OP,
__CTRL_ATTR_MAX, __CTRL_ATTR_MAX,
}; };
@ -83,6 +87,15 @@ enum {
__CTRL_ATTR_MCAST_GRP_MAX, __CTRL_ATTR_MCAST_GRP_MAX,
}; };
enum {
CTRL_ATTR_POLICY_UNSPEC,
CTRL_ATTR_POLICY_DO,
CTRL_ATTR_POLICY_DUMP,
__CTRL_ATTR_POLICY_DUMP_MAX,
CTRL_ATTR_POLICY_DUMP_MAX = __CTRL_ATTR_POLICY_DUMP_MAX - 1
};
#define CTRL_ATTR_MCAST_GRP_MAX (__CTRL_ATTR_MCAST_GRP_MAX - 1) #define CTRL_ATTR_MCAST_GRP_MAX (__CTRL_ATTR_MCAST_GRP_MAX - 1)

View File

@ -68,6 +68,7 @@ struct icmp6hdr {
#define icmp6_mtu icmp6_dataun.un_data32[0] #define icmp6_mtu icmp6_dataun.un_data32[0]
#define icmp6_unused icmp6_dataun.un_data32[0] #define icmp6_unused icmp6_dataun.un_data32[0]
#define icmp6_maxdelay icmp6_dataun.un_data16[0] #define icmp6_maxdelay icmp6_dataun.un_data16[0]
#define icmp6_datagram_len icmp6_dataun.un_data8[0]
#define icmp6_router icmp6_dataun.u_nd_advt.router #define icmp6_router icmp6_dataun.u_nd_advt.router
#define icmp6_solicited icmp6_dataun.u_nd_advt.solicited #define icmp6_solicited icmp6_dataun.u_nd_advt.solicited
#define icmp6_override icmp6_dataun.u_nd_advt.override #define icmp6_override icmp6_dataun.u_nd_advt.override
@ -137,7 +138,11 @@ struct icmp6hdr {
#define ICMPV6_HDR_FIELD 0 #define ICMPV6_HDR_FIELD 0
#define ICMPV6_UNK_NEXTHDR 1 #define ICMPV6_UNK_NEXTHDR 1
#define ICMPV6_UNK_OPTION 2 #define ICMPV6_UNK_OPTION 2
#define ICMPV6_HDR_INCOMP 3
/* Codes for EXT_ECHO (PROBE) */
#define ICMPV6_EXT_ECHO_REQUEST 160
#define ICMPV6_EXT_ECHO_REPLY 161
/* /*
* constants for (set|get)sockopt * constants for (set|get)sockopt
*/ */

View File

@ -176,6 +176,7 @@ enum {
enum { enum {
IF_LINK_MODE_DEFAULT, IF_LINK_MODE_DEFAULT,
IF_LINK_MODE_DORMANT, /* limit upward transition to dormant */ IF_LINK_MODE_DORMANT, /* limit upward transition to dormant */
IF_LINK_MODE_TESTING, /* limit upward transition to testing */
}; };
/* /*

View File

@ -24,6 +24,22 @@ struct sockaddr_alg {
__u8 salg_name[64]; __u8 salg_name[64];
}; };
/*
* Linux v4.12 and later removed the 64-byte limit on salg_name[]; it's now an
* arbitrary-length field. We had to keep the original struct above for source
* compatibility with existing userspace programs, though. Use the new struct
* below if support for very long algorithm names is needed. To do this,
* allocate 'sizeof(struct sockaddr_alg_new) + strlen(algname) + 1' bytes, and
* copy algname (including the null terminator) into salg_name.
*/
struct sockaddr_alg_new {
__u16 salg_family;
__u8 salg_type[14];
__u32 salg_feat;
__u32 salg_mask;
__u8 salg_name[];
};
struct af_alg_iv { struct af_alg_iv {
__u32 ivlen; __u32 ivlen;
__u8 iv[0]; __u8 iv[0];
@ -35,6 +51,7 @@ struct af_alg_iv {
#define ALG_SET_OP 3 #define ALG_SET_OP 3
#define ALG_SET_AEAD_ASSOCLEN 4 #define ALG_SET_AEAD_ASSOCLEN 4
#define ALG_SET_AEAD_AUTHSIZE 5 #define ALG_SET_AEAD_AUTHSIZE 5
#define ALG_SET_DRBG_ENTROPY 6
/* Operations */ /* Operations */
#define ALG_OP_DECRYPT 0 #define ALG_OP_DECRYPT 0

View File

@ -54,6 +54,7 @@
#define ARPHRD_X25 271 /* CCITT X.25 */ #define ARPHRD_X25 271 /* CCITT X.25 */
#define ARPHRD_HWX25 272 /* Boards with X.25 in firmware */ #define ARPHRD_HWX25 272 /* Boards with X.25 in firmware */
#define ARPHRD_CAN 280 /* Controller Area Network */ #define ARPHRD_CAN 280 /* Controller Area Network */
#define ARPHRD_MCTP 290
#define ARPHRD_PPP 512 #define ARPHRD_PPP 512
#define ARPHRD_CISCO 513 /* Cisco HDLC */ #define ARPHRD_CISCO 513 /* Cisco HDLC */
#define ARPHRD_HDLC ARPHRD_CISCO #define ARPHRD_HDLC ARPHRD_CISCO

View File

@ -94,6 +94,7 @@
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */ #define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */ #define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */ #define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */
/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */ /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
#define LACP_STATE_LACP_ACTIVITY 0x1 #define LACP_STATE_LACP_ACTIVITY 0x1
@ -152,14 +153,3 @@ enum {
#define BOND_3AD_STAT_MAX (__BOND_3AD_STAT_MAX - 1) #define BOND_3AD_STAT_MAX (__BOND_3AD_STAT_MAX - 1)
#endif /* _LINUX_IF_BONDING_H */ #endif /* _LINUX_IF_BONDING_H */
/*
* Local variables:
* version-control: t
* kept-new-versions: 5
* c-indent-level: 8
* c-basic-offset: 8
* tab-width: 8
* End:
*/

View File

@ -120,6 +120,8 @@ enum {
IFLA_BRIDGE_MODE, IFLA_BRIDGE_MODE,
IFLA_BRIDGE_VLAN_INFO, IFLA_BRIDGE_VLAN_INFO,
IFLA_BRIDGE_VLAN_TUNNEL_INFO, IFLA_BRIDGE_VLAN_TUNNEL_INFO,
IFLA_BRIDGE_MRP,
IFLA_BRIDGE_CFM,
__IFLA_BRIDGE_MAX, __IFLA_BRIDGE_MAX,
}; };
#define IFLA_BRIDGE_MAX (__IFLA_BRIDGE_MAX - 1) #define IFLA_BRIDGE_MAX (__IFLA_BRIDGE_MAX - 1)
@ -157,6 +159,300 @@ struct bridge_vlan_xstats {
__u32 pad2; __u32 pad2;
}; };
enum {
IFLA_BRIDGE_MRP_UNSPEC,
IFLA_BRIDGE_MRP_INSTANCE,
IFLA_BRIDGE_MRP_PORT_STATE,
IFLA_BRIDGE_MRP_PORT_ROLE,
IFLA_BRIDGE_MRP_RING_STATE,
IFLA_BRIDGE_MRP_RING_ROLE,
IFLA_BRIDGE_MRP_START_TEST,
IFLA_BRIDGE_MRP_INFO,
IFLA_BRIDGE_MRP_IN_ROLE,
IFLA_BRIDGE_MRP_IN_STATE,
IFLA_BRIDGE_MRP_START_IN_TEST,
__IFLA_BRIDGE_MRP_MAX,
};
#define IFLA_BRIDGE_MRP_MAX (__IFLA_BRIDGE_MRP_MAX - 1)
enum {
IFLA_BRIDGE_MRP_INSTANCE_UNSPEC,
IFLA_BRIDGE_MRP_INSTANCE_RING_ID,
IFLA_BRIDGE_MRP_INSTANCE_P_IFINDEX,
IFLA_BRIDGE_MRP_INSTANCE_S_IFINDEX,
IFLA_BRIDGE_MRP_INSTANCE_PRIO,
__IFLA_BRIDGE_MRP_INSTANCE_MAX,
};
#define IFLA_BRIDGE_MRP_INSTANCE_MAX (__IFLA_BRIDGE_MRP_INSTANCE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_PORT_STATE_UNSPEC,
IFLA_BRIDGE_MRP_PORT_STATE_STATE,
__IFLA_BRIDGE_MRP_PORT_STATE_MAX,
};
#define IFLA_BRIDGE_MRP_PORT_STATE_MAX (__IFLA_BRIDGE_MRP_PORT_STATE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_PORT_ROLE_UNSPEC,
IFLA_BRIDGE_MRP_PORT_ROLE_ROLE,
__IFLA_BRIDGE_MRP_PORT_ROLE_MAX,
};
#define IFLA_BRIDGE_MRP_PORT_ROLE_MAX (__IFLA_BRIDGE_MRP_PORT_ROLE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_RING_STATE_UNSPEC,
IFLA_BRIDGE_MRP_RING_STATE_RING_ID,
IFLA_BRIDGE_MRP_RING_STATE_STATE,
__IFLA_BRIDGE_MRP_RING_STATE_MAX,
};
#define IFLA_BRIDGE_MRP_RING_STATE_MAX (__IFLA_BRIDGE_MRP_RING_STATE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_RING_ROLE_UNSPEC,
IFLA_BRIDGE_MRP_RING_ROLE_RING_ID,
IFLA_BRIDGE_MRP_RING_ROLE_ROLE,
__IFLA_BRIDGE_MRP_RING_ROLE_MAX,
};
#define IFLA_BRIDGE_MRP_RING_ROLE_MAX (__IFLA_BRIDGE_MRP_RING_ROLE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_START_TEST_UNSPEC,
IFLA_BRIDGE_MRP_START_TEST_RING_ID,
IFLA_BRIDGE_MRP_START_TEST_INTERVAL,
IFLA_BRIDGE_MRP_START_TEST_MAX_MISS,
IFLA_BRIDGE_MRP_START_TEST_PERIOD,
IFLA_BRIDGE_MRP_START_TEST_MONITOR,
__IFLA_BRIDGE_MRP_START_TEST_MAX,
};
#define IFLA_BRIDGE_MRP_START_TEST_MAX (__IFLA_BRIDGE_MRP_START_TEST_MAX - 1)
enum {
IFLA_BRIDGE_MRP_INFO_UNSPEC,
IFLA_BRIDGE_MRP_INFO_RING_ID,
IFLA_BRIDGE_MRP_INFO_P_IFINDEX,
IFLA_BRIDGE_MRP_INFO_S_IFINDEX,
IFLA_BRIDGE_MRP_INFO_PRIO,
IFLA_BRIDGE_MRP_INFO_RING_STATE,
IFLA_BRIDGE_MRP_INFO_RING_ROLE,
IFLA_BRIDGE_MRP_INFO_TEST_INTERVAL,
IFLA_BRIDGE_MRP_INFO_TEST_MAX_MISS,
IFLA_BRIDGE_MRP_INFO_TEST_MONITOR,
IFLA_BRIDGE_MRP_INFO_I_IFINDEX,
IFLA_BRIDGE_MRP_INFO_IN_STATE,
IFLA_BRIDGE_MRP_INFO_IN_ROLE,
IFLA_BRIDGE_MRP_INFO_IN_TEST_INTERVAL,
IFLA_BRIDGE_MRP_INFO_IN_TEST_MAX_MISS,
__IFLA_BRIDGE_MRP_INFO_MAX,
};
#define IFLA_BRIDGE_MRP_INFO_MAX (__IFLA_BRIDGE_MRP_INFO_MAX - 1)
enum {
IFLA_BRIDGE_MRP_IN_STATE_UNSPEC,
IFLA_BRIDGE_MRP_IN_STATE_IN_ID,
IFLA_BRIDGE_MRP_IN_STATE_STATE,
__IFLA_BRIDGE_MRP_IN_STATE_MAX,
};
#define IFLA_BRIDGE_MRP_IN_STATE_MAX (__IFLA_BRIDGE_MRP_IN_STATE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_IN_ROLE_UNSPEC,
IFLA_BRIDGE_MRP_IN_ROLE_RING_ID,
IFLA_BRIDGE_MRP_IN_ROLE_IN_ID,
IFLA_BRIDGE_MRP_IN_ROLE_ROLE,
IFLA_BRIDGE_MRP_IN_ROLE_I_IFINDEX,
__IFLA_BRIDGE_MRP_IN_ROLE_MAX,
};
#define IFLA_BRIDGE_MRP_IN_ROLE_MAX (__IFLA_BRIDGE_MRP_IN_ROLE_MAX - 1)
enum {
IFLA_BRIDGE_MRP_START_IN_TEST_UNSPEC,
IFLA_BRIDGE_MRP_START_IN_TEST_IN_ID,
IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL,
IFLA_BRIDGE_MRP_START_IN_TEST_MAX_MISS,
IFLA_BRIDGE_MRP_START_IN_TEST_PERIOD,
__IFLA_BRIDGE_MRP_START_IN_TEST_MAX,
};
#define IFLA_BRIDGE_MRP_START_IN_TEST_MAX (__IFLA_BRIDGE_MRP_START_IN_TEST_MAX - 1)
struct br_mrp_instance {
__u32 ring_id;
__u32 p_ifindex;
__u32 s_ifindex;
__u16 prio;
};
struct br_mrp_ring_state {
__u32 ring_id;
__u32 ring_state;
};
struct br_mrp_ring_role {
__u32 ring_id;
__u32 ring_role;
};
struct br_mrp_start_test {
__u32 ring_id;
__u32 interval;
__u32 max_miss;
__u32 period;
__u32 monitor;
};
struct br_mrp_in_state {
__u32 in_state;
__u16 in_id;
};
struct br_mrp_in_role {
__u32 ring_id;
__u32 in_role;
__u32 i_ifindex;
__u16 in_id;
};
struct br_mrp_start_in_test {
__u32 interval;
__u32 max_miss;
__u32 period;
__u16 in_id;
};
enum {
IFLA_BRIDGE_CFM_UNSPEC,
IFLA_BRIDGE_CFM_MEP_CREATE,
IFLA_BRIDGE_CFM_MEP_DELETE,
IFLA_BRIDGE_CFM_MEP_CONFIG,
IFLA_BRIDGE_CFM_CC_CONFIG,
IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD,
IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE,
IFLA_BRIDGE_CFM_CC_RDI,
IFLA_BRIDGE_CFM_CC_CCM_TX,
IFLA_BRIDGE_CFM_MEP_CREATE_INFO,
IFLA_BRIDGE_CFM_MEP_CONFIG_INFO,
IFLA_BRIDGE_CFM_CC_CONFIG_INFO,
IFLA_BRIDGE_CFM_CC_RDI_INFO,
IFLA_BRIDGE_CFM_CC_CCM_TX_INFO,
IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO,
IFLA_BRIDGE_CFM_MEP_STATUS_INFO,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO,
__IFLA_BRIDGE_CFM_MAX,
};
#define IFLA_BRIDGE_CFM_MAX (__IFLA_BRIDGE_CFM_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_CREATE_UNSPEC,
IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE,
IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN,
IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION,
IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX,
__IFLA_BRIDGE_CFM_MEP_CREATE_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_CREATE_MAX (__IFLA_BRIDGE_CFM_MEP_CREATE_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_DELETE_UNSPEC,
IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE,
__IFLA_BRIDGE_CFM_MEP_DELETE_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_DELETE_MAX (__IFLA_BRIDGE_CFM_MEP_DELETE_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_CONFIG_UNSPEC,
IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE,
IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC,
IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL,
IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID,
__IFLA_BRIDGE_CFM_MEP_CONFIG_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_CONFIG_MAX (__IFLA_BRIDGE_CFM_MEP_CONFIG_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_CONFIG_UNSPEC,
IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE,
IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE,
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL,
IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID,
__IFLA_BRIDGE_CFM_CC_CONFIG_MAX,
};
#define IFLA_BRIDGE_CFM_CC_CONFIG_MAX (__IFLA_BRIDGE_CFM_CC_CONFIG_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_PEER_MEP_UNSPEC,
IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE,
IFLA_BRIDGE_CFM_CC_PEER_MEPID,
__IFLA_BRIDGE_CFM_CC_PEER_MEP_MAX,
};
#define IFLA_BRIDGE_CFM_CC_PEER_MEP_MAX (__IFLA_BRIDGE_CFM_CC_PEER_MEP_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_RDI_UNSPEC,
IFLA_BRIDGE_CFM_CC_RDI_INSTANCE,
IFLA_BRIDGE_CFM_CC_RDI_RDI,
__IFLA_BRIDGE_CFM_CC_RDI_MAX,
};
#define IFLA_BRIDGE_CFM_CC_RDI_MAX (__IFLA_BRIDGE_CFM_CC_RDI_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_CCM_TX_UNSPEC,
IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE,
IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC,
IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE,
IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD,
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV,
IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE,
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV,
IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE,
__IFLA_BRIDGE_CFM_CC_CCM_TX_MAX,
};
#define IFLA_BRIDGE_CFM_CC_CCM_TX_MAX (__IFLA_BRIDGE_CFM_CC_CCM_TX_MAX - 1)
enum {
IFLA_BRIDGE_CFM_MEP_STATUS_UNSPEC,
IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE,
IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN,
IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN,
IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN,
__IFLA_BRIDGE_CFM_MEP_STATUS_MAX,
};
#define IFLA_BRIDGE_CFM_MEP_STATUS_MAX (__IFLA_BRIDGE_CFM_MEP_STATUS_MAX - 1)
enum {
IFLA_BRIDGE_CFM_CC_PEER_STATUS_UNSPEC,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN,
IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN,
__IFLA_BRIDGE_CFM_CC_PEER_STATUS_MAX,
};
#define IFLA_BRIDGE_CFM_CC_PEER_STATUS_MAX (__IFLA_BRIDGE_CFM_CC_PEER_STATUS_MAX - 1)
struct bridge_stp_xstats { struct bridge_stp_xstats {
__u64 transition_blk; __u64 transition_blk;
__u64 transition_fwd; __u64 transition_fwd;
@ -183,16 +479,22 @@ enum {
/* flags used in BRIDGE_VLANDB_DUMP_FLAGS attribute to affect dumps */ /* flags used in BRIDGE_VLANDB_DUMP_FLAGS attribute to affect dumps */
#define BRIDGE_VLANDB_DUMPF_STATS (1 << 0) /* Include stats in the dump */ #define BRIDGE_VLANDB_DUMPF_STATS (1 << 0) /* Include stats in the dump */
#define BRIDGE_VLANDB_DUMPF_GLOBAL (1 << 1) /* Dump global vlan options only */
/* Bridge vlan RTM attributes /* Bridge vlan RTM attributes
* [BRIDGE_VLANDB_ENTRY] = { * [BRIDGE_VLANDB_ENTRY] = {
* [BRIDGE_VLANDB_ENTRY_INFO] * [BRIDGE_VLANDB_ENTRY_INFO]
* ... * ...
* } * }
* [BRIDGE_VLANDB_GLOBAL_OPTIONS] = {
* [BRIDGE_VLANDB_GOPTS_ID]
* ...
* }
*/ */
enum { enum {
BRIDGE_VLANDB_UNSPEC, BRIDGE_VLANDB_UNSPEC,
BRIDGE_VLANDB_ENTRY, BRIDGE_VLANDB_ENTRY,
BRIDGE_VLANDB_GLOBAL_OPTIONS,
__BRIDGE_VLANDB_MAX, __BRIDGE_VLANDB_MAX,
}; };
#define BRIDGE_VLANDB_MAX (__BRIDGE_VLANDB_MAX - 1) #define BRIDGE_VLANDB_MAX (__BRIDGE_VLANDB_MAX - 1)
@ -204,6 +506,7 @@ enum {
BRIDGE_VLANDB_ENTRY_STATE, BRIDGE_VLANDB_ENTRY_STATE,
BRIDGE_VLANDB_ENTRY_TUNNEL_INFO, BRIDGE_VLANDB_ENTRY_TUNNEL_INFO,
BRIDGE_VLANDB_ENTRY_STATS, BRIDGE_VLANDB_ENTRY_STATS,
BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
__BRIDGE_VLANDB_ENTRY_MAX, __BRIDGE_VLANDB_ENTRY_MAX,
}; };
#define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1) #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
@ -242,6 +545,29 @@ enum {
}; };
#define BRIDGE_VLANDB_STATS_MAX (__BRIDGE_VLANDB_STATS_MAX - 1) #define BRIDGE_VLANDB_STATS_MAX (__BRIDGE_VLANDB_STATS_MAX - 1)
enum {
BRIDGE_VLANDB_GOPTS_UNSPEC,
BRIDGE_VLANDB_GOPTS_ID,
BRIDGE_VLANDB_GOPTS_RANGE,
BRIDGE_VLANDB_GOPTS_MCAST_SNOOPING,
BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION,
BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION,
BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_CNT,
BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT,
BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_INTVL,
BRIDGE_VLANDB_GOPTS_PAD,
BRIDGE_VLANDB_GOPTS_MCAST_MEMBERSHIP_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERY_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERY_RESPONSE_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_INTVL,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER,
BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE,
__BRIDGE_VLANDB_GOPTS_MAX
};
#define BRIDGE_VLANDB_GOPTS_MAX (__BRIDGE_VLANDB_GOPTS_MAX - 1)
/* Bridge multicast database attributes /* Bridge multicast database attributes
* [MDBA_MDB] = { * [MDBA_MDB] = {
* [MDBA_MDB_ENTRY] = { * [MDBA_MDB_ENTRY] = {
@ -284,10 +610,33 @@ enum {
enum { enum {
MDBA_MDB_EATTR_UNSPEC, MDBA_MDB_EATTR_UNSPEC,
MDBA_MDB_EATTR_TIMER, MDBA_MDB_EATTR_TIMER,
MDBA_MDB_EATTR_SRC_LIST,
MDBA_MDB_EATTR_GROUP_MODE,
MDBA_MDB_EATTR_SOURCE,
MDBA_MDB_EATTR_RTPROT,
__MDBA_MDB_EATTR_MAX __MDBA_MDB_EATTR_MAX
}; };
#define MDBA_MDB_EATTR_MAX (__MDBA_MDB_EATTR_MAX - 1) #define MDBA_MDB_EATTR_MAX (__MDBA_MDB_EATTR_MAX - 1)
/* per mdb entry source */
enum {
MDBA_MDB_SRCLIST_UNSPEC,
MDBA_MDB_SRCLIST_ENTRY,
__MDBA_MDB_SRCLIST_MAX
};
#define MDBA_MDB_SRCLIST_MAX (__MDBA_MDB_SRCLIST_MAX - 1)
/* per mdb entry per source attributes
* these are embedded in MDBA_MDB_SRCLIST_ENTRY
*/
enum {
MDBA_MDB_SRCATTR_UNSPEC,
MDBA_MDB_SRCATTR_ADDRESS,
MDBA_MDB_SRCATTR_TIMER,
__MDBA_MDB_SRCATTR_MAX
};
#define MDBA_MDB_SRCATTR_MAX (__MDBA_MDB_SRCATTR_MAX - 1)
/* multicast router types */ /* multicast router types */
enum { enum {
MDB_RTR_TYPE_DISABLED, MDB_RTR_TYPE_DISABLED,
@ -308,6 +657,9 @@ enum {
MDBA_ROUTER_PATTR_UNSPEC, MDBA_ROUTER_PATTR_UNSPEC,
MDBA_ROUTER_PATTR_TIMER, MDBA_ROUTER_PATTR_TIMER,
MDBA_ROUTER_PATTR_TYPE, MDBA_ROUTER_PATTR_TYPE,
MDBA_ROUTER_PATTR_INET_TIMER,
MDBA_ROUTER_PATTR_INET6_TIMER,
MDBA_ROUTER_PATTR_VID,
__MDBA_ROUTER_PATTR_MAX __MDBA_ROUTER_PATTR_MAX
}; };
#define MDBA_ROUTER_PATTR_MAX (__MDBA_ROUTER_PATTR_MAX - 1) #define MDBA_ROUTER_PATTR_MAX (__MDBA_ROUTER_PATTR_MAX - 1)
@ -324,12 +676,15 @@ struct br_mdb_entry {
__u8 state; __u8 state;
#define MDB_FLAGS_OFFLOAD (1 << 0) #define MDB_FLAGS_OFFLOAD (1 << 0)
#define MDB_FLAGS_FAST_LEAVE (1 << 1) #define MDB_FLAGS_FAST_LEAVE (1 << 1)
#define MDB_FLAGS_STAR_EXCL (1 << 2)
#define MDB_FLAGS_BLOCKED (1 << 3)
__u8 flags; __u8 flags;
__u16 vid; __u16 vid;
struct { struct {
union { union {
__be32 ip4; __be32 ip4;
struct in6_addr ip6; struct in6_addr ip6;
unsigned char mac_addr[ETH_ALEN];
} u; } u;
__be16 proto; __be16 proto;
} addr; } addr;
@ -338,10 +693,23 @@ struct br_mdb_entry {
enum { enum {
MDBA_SET_ENTRY_UNSPEC, MDBA_SET_ENTRY_UNSPEC,
MDBA_SET_ENTRY, MDBA_SET_ENTRY,
MDBA_SET_ENTRY_ATTRS,
__MDBA_SET_ENTRY_MAX, __MDBA_SET_ENTRY_MAX,
}; };
#define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1) #define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)
/* [MDBA_SET_ENTRY_ATTRS] = {
* [MDBE_ATTR_xxx]
* ...
* }
*/
enum {
MDBE_ATTR_UNSPEC,
MDBE_ATTR_SOURCE,
__MDBE_ATTR_MAX,
};
#define MDBE_ATTR_MAX (__MDBE_ATTR_MAX - 1)
/* Embedded inside LINK_XSTATS_TYPE_BRIDGE */ /* Embedded inside LINK_XSTATS_TYPE_BRIDGE */
enum { enum {
BRIDGE_XSTATS_UNSPEC, BRIDGE_XSTATS_UNSPEC,
@ -383,12 +751,14 @@ struct br_mcast_stats {
/* bridge boolean options /* bridge boolean options
* BR_BOOLOPT_NO_LL_LEARN - disable learning from link-local packets * BR_BOOLOPT_NO_LL_LEARN - disable learning from link-local packets
* BR_BOOLOPT_MCAST_VLAN_SNOOPING - control vlan multicast snooping
* *
* IMPORTANT: if adding a new option do not forget to handle * IMPORTANT: if adding a new option do not forget to handle
* it in br_boolopt_toggle/get and bridge sysfs * it in br_boolopt_toggle/get and bridge sysfs
*/ */
enum br_boolopt_id { enum br_boolopt_id {
BR_BOOLOPT_NO_LL_LEARN, BR_BOOLOPT_NO_LL_LEARN,
BR_BOOLOPT_MCAST_VLAN_SNOOPING,
BR_BOOLOPT_MAX BR_BOOLOPT_MAX
}; };
@ -401,4 +771,17 @@ struct br_boolopt_multi {
__u32 optval; __u32 optval;
__u32 optmask; __u32 optmask;
}; };
enum {
BRIDGE_QUERIER_UNSPEC,
BRIDGE_QUERIER_IP_ADDRESS,
BRIDGE_QUERIER_IP_PORT,
BRIDGE_QUERIER_IP_OTHER_TIMER,
BRIDGE_QUERIER_PAD,
BRIDGE_QUERIER_IPV6_ADDRESS,
BRIDGE_QUERIER_IPV6_PORT,
BRIDGE_QUERIER_IPV6_OTHER_TIMER,
__BRIDGE_QUERIER_MAX
};
#define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
#endif /* _LINUX_IF_BRIDGE_H */ #endif /* _LINUX_IF_BRIDGE_H */

View File

@ -86,18 +86,21 @@
* over Ethernet * over Ethernet
*/ */
#define ETH_P_PAE 0x888E /* Port Access Entity (IEEE 802.1X) */ #define ETH_P_PAE 0x888E /* Port Access Entity (IEEE 802.1X) */
#define ETH_P_REALTEK 0x8899 /* Multiple proprietary protocols */
#define ETH_P_AOE 0x88A2 /* ATA over Ethernet */ #define ETH_P_AOE 0x88A2 /* ATA over Ethernet */
#define ETH_P_8021AD 0x88A8 /* 802.1ad Service VLAN */ #define ETH_P_8021AD 0x88A8 /* 802.1ad Service VLAN */
#define ETH_P_802_EX1 0x88B5 /* 802.1 Local Experimental 1. */ #define ETH_P_802_EX1 0x88B5 /* 802.1 Local Experimental 1. */
#define ETH_P_PREAUTH 0x88C7 /* 802.11 Preauthentication */ #define ETH_P_PREAUTH 0x88C7 /* 802.11 Preauthentication */
#define ETH_P_TIPC 0x88CA /* TIPC */ #define ETH_P_TIPC 0x88CA /* TIPC */
#define ETH_P_LLDP 0x88CC /* Link Layer Discovery Protocol */ #define ETH_P_LLDP 0x88CC /* Link Layer Discovery Protocol */
#define ETH_P_MRP 0x88E3 /* Media Redundancy Protocol */
#define ETH_P_MACSEC 0x88E5 /* 802.1ae MACsec */ #define ETH_P_MACSEC 0x88E5 /* 802.1ae MACsec */
#define ETH_P_8021AH 0x88E7 /* 802.1ah Backbone Service Tag */ #define ETH_P_8021AH 0x88E7 /* 802.1ah Backbone Service Tag */
#define ETH_P_MVRP 0x88F5 /* 802.1Q MVRP */ #define ETH_P_MVRP 0x88F5 /* 802.1Q MVRP */
#define ETH_P_1588 0x88F7 /* IEEE 1588 Timesync */ #define ETH_P_1588 0x88F7 /* IEEE 1588 Timesync */
#define ETH_P_NCSI 0x88F8 /* NCSI protocol */ #define ETH_P_NCSI 0x88F8 /* NCSI protocol */
#define ETH_P_PRP 0x88FB /* IEC 62439-3 PRP/HSRv0 */ #define ETH_P_PRP 0x88FB /* IEC 62439-3 PRP/HSRv0 */
#define ETH_P_CFM 0x8902 /* Connectivity Fault Management */
#define ETH_P_FCOE 0x8906 /* Fibre Channel over Ethernet */ #define ETH_P_FCOE 0x8906 /* Fibre Channel over Ethernet */
#define ETH_P_IBOE 0x8915 /* Infiniband over Ethernet */ #define ETH_P_IBOE 0x8915 /* Infiniband over Ethernet */
#define ETH_P_TDLS 0x890D /* TDLS */ #define ETH_P_TDLS 0x890D /* TDLS */
@ -114,7 +117,7 @@
#define ETH_P_IFE 0xED3E /* ForCES inter-FE LFB type */ #define ETH_P_IFE 0xED3E /* ForCES inter-FE LFB type */
#define ETH_P_AF_IUCV 0xFBFB /* IBM af_iucv [ NOT AN OFFICIALLY REGISTERED ID ] */ #define ETH_P_AF_IUCV 0xFBFB /* IBM af_iucv [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_802_3_MIN 0x0600 /* If the value in the ethernet type is less than this value #define ETH_P_802_3_MIN 0x0600 /* If the value in the ethernet type is more than this value
* then the frame is Ethernet II. Else it is 802.3 */ * then the frame is Ethernet II. Else it is 802.3 */
/* /*
@ -149,6 +152,9 @@
#define ETH_P_MAP 0x00F9 /* Qualcomm multiplexing and #define ETH_P_MAP 0x00F9 /* Qualcomm multiplexing and
* aggregation protocol * aggregation protocol
*/ */
#define ETH_P_MCTP 0x00FA /* Management component transport
* protocol packets
*/
/* /*
* This is an Ethernet frame header. * This is an Ethernet frame header.

View File

@ -7,24 +7,23 @@
/* This struct should be in sync with struct rtnl_link_stats64 */ /* This struct should be in sync with struct rtnl_link_stats64 */
struct rtnl_link_stats { struct rtnl_link_stats {
__u32 rx_packets; /* total packets received */ __u32 rx_packets;
__u32 tx_packets; /* total packets transmitted */ __u32 tx_packets;
__u32 rx_bytes; /* total bytes received */ __u32 rx_bytes;
__u32 tx_bytes; /* total bytes transmitted */ __u32 tx_bytes;
__u32 rx_errors; /* bad packets received */ __u32 rx_errors;
__u32 tx_errors; /* packet transmit problems */ __u32 tx_errors;
__u32 rx_dropped; /* no space in linux buffers */ __u32 rx_dropped;
__u32 tx_dropped; /* no space available in linux */ __u32 tx_dropped;
__u32 multicast; /* multicast packets received */ __u32 multicast;
__u32 collisions; __u32 collisions;
/* detailed rx_errors: */ /* detailed rx_errors: */
__u32 rx_length_errors; __u32 rx_length_errors;
__u32 rx_over_errors; /* receiver ring buff overflow */ __u32 rx_over_errors;
__u32 rx_crc_errors; /* recved pkt with crc error */ __u32 rx_crc_errors;
__u32 rx_frame_errors; /* recv'd frame alignment error */ __u32 rx_frame_errors;
__u32 rx_fifo_errors; /* recv'r fifo overrun */ __u32 rx_fifo_errors;
__u32 rx_missed_errors; /* receiver missed packet */ __u32 rx_missed_errors;
/* detailed tx_errors */ /* detailed tx_errors */
__u32 tx_aborted_errors; __u32 tx_aborted_errors;
@ -37,29 +36,201 @@ struct rtnl_link_stats {
__u32 rx_compressed; __u32 rx_compressed;
__u32 tx_compressed; __u32 tx_compressed;
__u32 rx_nohandler; /* dropped, no handler found */ __u32 rx_nohandler;
}; };
/* The main device statistics structure */ /**
* struct rtnl_link_stats64 - The main device statistics structure.
*
* @rx_packets: Number of good packets received by the interface.
* For hardware interfaces counts all good packets received from the device
* by the host, including packets which host had to drop at various stages
* of processing (even in the driver).
*
* @tx_packets: Number of packets successfully transmitted.
* For hardware interfaces counts packets which host was able to successfully
* hand over to the device, which does not necessarily mean that packets
* had been successfully transmitted out of the device, only that device
* acknowledged it copied them out of host memory.
*
* @rx_bytes: Number of good received bytes, corresponding to @rx_packets.
*
* For IEEE 802.3 devices should count the length of Ethernet Frames
* excluding the FCS.
*
* @tx_bytes: Number of good transmitted bytes, corresponding to @tx_packets.
*
* For IEEE 802.3 devices should count the length of Ethernet Frames
* excluding the FCS.
*
* @rx_errors: Total number of bad packets received on this network device.
* This counter must include events counted by @rx_length_errors,
* @rx_crc_errors, @rx_frame_errors and other errors not otherwise
* counted.
*
* @tx_errors: Total number of transmit problems.
* This counter must include events counter by @tx_aborted_errors,
* @tx_carrier_errors, @tx_fifo_errors, @tx_heartbeat_errors,
* @tx_window_errors and other errors not otherwise counted.
*
* @rx_dropped: Number of packets received but not processed,
* e.g. due to lack of resources or unsupported protocol.
* For hardware interfaces this counter may include packets discarded
* due to L2 address filtering but should not include packets dropped
* by the device due to buffer exhaustion which are counted separately in
* @rx_missed_errors (since procfs folds those two counters together).
*
* @tx_dropped: Number of packets dropped on their way to transmission,
* e.g. due to lack of resources.
*
* @multicast: Multicast packets received.
* For hardware interfaces this statistic is commonly calculated
* at the device level (unlike @rx_packets) and therefore may include
* packets which did not reach the host.
*
* For IEEE 802.3 devices this counter may be equivalent to:
*
* - 30.3.1.1.21 aMulticastFramesReceivedOK
*
* @collisions: Number of collisions during packet transmissions.
*
* @rx_length_errors: Number of packets dropped due to invalid length.
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter should be equivalent to a sum
* of the following attributes:
*
* - 30.3.1.1.23 aInRangeLengthErrors
* - 30.3.1.1.24 aOutOfRangeLengthField
* - 30.3.1.1.25 aFrameTooLongErrors
*
* @rx_over_errors: Receiver FIFO overflow event counter.
*
* Historically the count of overflow events. Such events may be
* reported in the receive descriptors or via interrupts, and may
* not correspond one-to-one with dropped packets.
*
* The recommended interpretation for high speed interfaces is -
* number of packets dropped because they did not fit into buffers
* provided by the host, e.g. packets larger than MTU or next buffer
* in the ring was not available for a scatter transfer.
*
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* This statistics was historically used interchangeably with
* @rx_fifo_errors.
*
* This statistic corresponds to hardware events and is not commonly used
* on software devices.
*
* @rx_crc_errors: Number of packets received with a CRC error.
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter must be equivalent to:
*
* - 30.3.1.1.6 aFrameCheckSequenceErrors
*
* @rx_frame_errors: Receiver frame alignment errors.
* Part of aggregate "frame" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter should be equivalent to:
*
* - 30.3.1.1.7 aAlignmentErrors
*
* @rx_fifo_errors: Receiver FIFO error counter.
*
* Historically the count of overflow events. Those events may be
* reported in the receive descriptors or via interrupts, and may
* not correspond one-to-one with dropped packets.
*
* This statistics was used interchangeably with @rx_over_errors.
* Not recommended for use in drivers for high speed interfaces.
*
* This statistic is used on software devices, e.g. to count software
* packet queue overflow (can) or sequencing errors (GRE).
*
* @rx_missed_errors: Count of packets missed by the host.
* Folded into the "drop" counter in `/proc/net/dev`.
*
* Counts number of packets dropped by the device due to lack
* of buffer space. This usually indicates that the host interface
* is slower than the network interface, or host is not keeping up
* with the receive packet rate.
*
* This statistic corresponds to hardware events and is not used
* on software devices.
*
* @tx_aborted_errors:
* Part of aggregate "carrier" errors in `/proc/net/dev`.
* For IEEE 802.3 devices capable of half-duplex operation this counter
* must be equivalent to:
*
* - 30.3.1.1.11 aFramesAbortedDueToXSColls
*
* High speed interfaces may use this counter as a general device
* discard counter.
*
* @tx_carrier_errors: Number of frame transmission errors due to loss
* of carrier during transmission.
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter must be equivalent to:
*
* - 30.3.1.1.13 aCarrierSenseErrors
*
* @tx_fifo_errors: Number of frame transmission errors due to device
* FIFO underrun / underflow. This condition occurs when the device
* begins transmission of a frame but is unable to deliver the
* entire frame to the transmitter in time for transmission.
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* @tx_heartbeat_errors: Number of Heartbeat / SQE Test errors for
* old half-duplex Ethernet.
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices possibly equivalent to:
*
* - 30.3.2.1.4 aSQETestErrors
*
* @tx_window_errors: Number of frame transmission errors due
* to late collisions (for Ethernet - after the first 64B of transmission).
* Part of aggregate "carrier" errors in `/proc/net/dev`.
*
* For IEEE 802.3 devices this counter must be equivalent to:
*
* - 30.3.1.1.10 aLateCollisions
*
* @rx_compressed: Number of correctly received compressed packets.
* This counters is only meaningful for interfaces which support
* packet compression (e.g. CSLIP, PPP).
*
* @tx_compressed: Number of transmitted compressed packets.
* This counters is only meaningful for interfaces which support
* packet compression (e.g. CSLIP, PPP).
*
* @rx_nohandler: Number of packets received on the interface
* but dropped by the networking stack because the device is
* not designated to receive packets (e.g. backup link in a bond).
*/
struct rtnl_link_stats64 { struct rtnl_link_stats64 {
__u64 rx_packets; /* total packets received */ __u64 rx_packets;
__u64 tx_packets; /* total packets transmitted */ __u64 tx_packets;
__u64 rx_bytes; /* total bytes received */ __u64 rx_bytes;
__u64 tx_bytes; /* total bytes transmitted */ __u64 tx_bytes;
__u64 rx_errors; /* bad packets received */ __u64 rx_errors;
__u64 tx_errors; /* packet transmit problems */ __u64 tx_errors;
__u64 rx_dropped; /* no space in linux buffers */ __u64 rx_dropped;
__u64 tx_dropped; /* no space available in linux */ __u64 tx_dropped;
__u64 multicast; /* multicast packets received */ __u64 multicast;
__u64 collisions; __u64 collisions;
/* detailed rx_errors: */ /* detailed rx_errors: */
__u64 rx_length_errors; __u64 rx_length_errors;
__u64 rx_over_errors; /* receiver ring buff overflow */ __u64 rx_over_errors;
__u64 rx_crc_errors; /* recved pkt with crc error */ __u64 rx_crc_errors;
__u64 rx_frame_errors; /* recv'd frame alignment error */ __u64 rx_frame_errors;
__u64 rx_fifo_errors; /* recv'r fifo overrun */ __u64 rx_fifo_errors;
__u64 rx_missed_errors; /* receiver missed packet */ __u64 rx_missed_errors;
/* detailed tx_errors */ /* detailed tx_errors */
__u64 tx_aborted_errors; __u64 tx_aborted_errors;
@ -71,8 +242,7 @@ struct rtnl_link_stats64 {
/* for cslip etc */ /* for cslip etc */
__u64 rx_compressed; __u64 rx_compressed;
__u64 tx_compressed; __u64 tx_compressed;
__u64 rx_nohandler;
__u64 rx_nohandler; /* dropped, no handler found */
}; };
/* The struct should be in sync with struct ifmap */ /* The struct should be in sync with struct ifmap */
@ -170,12 +340,29 @@ enum {
IFLA_PROP_LIST, IFLA_PROP_LIST,
IFLA_ALT_IFNAME, /* Alternative ifname */ IFLA_ALT_IFNAME, /* Alternative ifname */
IFLA_PERM_ADDRESS, IFLA_PERM_ADDRESS,
IFLA_PROTO_DOWN_REASON,
/* device (sysfs) name as parent, used instead
* of IFLA_LINK where there's no parent netdev
*/
IFLA_PARENT_DEV_NAME,
IFLA_PARENT_DEV_BUS_NAME,
__IFLA_MAX __IFLA_MAX
}; };
#define IFLA_MAX (__IFLA_MAX - 1) #define IFLA_MAX (__IFLA_MAX - 1)
enum {
IFLA_PROTO_DOWN_REASON_UNSPEC,
IFLA_PROTO_DOWN_REASON_MASK, /* u32, mask for reason bits */
IFLA_PROTO_DOWN_REASON_VALUE, /* u32, reason bit value */
__IFLA_PROTO_DOWN_REASON_CNT,
IFLA_PROTO_DOWN_REASON_MAX = __IFLA_PROTO_DOWN_REASON_CNT - 1
};
/* backwards compatibility for userspace */ /* backwards compatibility for userspace */
#define IFLA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifinfomsg)))) #define IFLA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifinfomsg))))
#define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg)) #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
@ -228,6 +415,7 @@ enum {
IFLA_INET6_ICMP6STATS, /* statistics (icmpv6) */ IFLA_INET6_ICMP6STATS, /* statistics (icmpv6) */
IFLA_INET6_TOKEN, /* device token */ IFLA_INET6_TOKEN, /* device token */
IFLA_INET6_ADDR_GEN_MODE, /* implicit address generator mode */ IFLA_INET6_ADDR_GEN_MODE, /* implicit address generator mode */
IFLA_INET6_RA_MTU, /* mtu carried in the RA message */
__IFLA_INET6_MAX __IFLA_INET6_MAX
}; };
@ -290,6 +478,7 @@ enum {
IFLA_BR_MCAST_MLD_VERSION, IFLA_BR_MCAST_MLD_VERSION,
IFLA_BR_VLAN_STATS_PER_PORT, IFLA_BR_VLAN_STATS_PER_PORT,
IFLA_BR_MULTI_BOOLOPT, IFLA_BR_MULTI_BOOLOPT,
IFLA_BR_MCAST_QUERIER_STATE,
__IFLA_BR_MAX, __IFLA_BR_MAX,
}; };
@ -341,6 +530,10 @@ enum {
IFLA_BRPORT_NEIGH_SUPPRESS, IFLA_BRPORT_NEIGH_SUPPRESS,
IFLA_BRPORT_ISOLATED, IFLA_BRPORT_ISOLATED,
IFLA_BRPORT_BACKUP_PORT, IFLA_BRPORT_BACKUP_PORT,
IFLA_BRPORT_MRP_RING_OPEN,
IFLA_BRPORT_MRP_IN_OPEN,
IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
__IFLA_BRPORT_MAX __IFLA_BRPORT_MAX
}; };
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1) #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
@ -405,6 +598,8 @@ enum {
IFLA_MACVLAN_MACADDR, IFLA_MACVLAN_MACADDR,
IFLA_MACVLAN_MACADDR_DATA, IFLA_MACVLAN_MACADDR_DATA,
IFLA_MACVLAN_MACADDR_COUNT, IFLA_MACVLAN_MACADDR_COUNT,
IFLA_MACVLAN_BC_QUEUE_LEN,
IFLA_MACVLAN_BC_QUEUE_LEN_USED,
__IFLA_MACVLAN_MAX, __IFLA_MACVLAN_MAX,
}; };
@ -426,6 +621,7 @@ enum macvlan_macaddr_mode {
}; };
#define MACVLAN_FLAG_NOPROMISC 1 #define MACVLAN_FLAG_NOPROMISC 1
#define MACVLAN_FLAG_NODST 2 /* skip dst macvlan if matching src macvlan */
/* VRF section */ /* VRF section */
enum { enum {
@ -659,6 +855,7 @@ enum {
IFLA_BOND_AD_ACTOR_SYSTEM, IFLA_BOND_AD_ACTOR_SYSTEM,
IFLA_BOND_TLB_DYNAMIC_LB, IFLA_BOND_TLB_DYNAMIC_LB,
IFLA_BOND_PEER_NOTIF_DELAY, IFLA_BOND_PEER_NOTIF_DELAY,
IFLA_BOND_AD_LACP_ACTIVE,
__IFLA_BOND_MAX, __IFLA_BOND_MAX,
}; };
@ -903,7 +1100,14 @@ enum {
#define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1) #define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
/* HSR section */ /* HSR/PRP section, both uses same interface */
/* Different redundancy protocols for hsr device */
enum {
HSR_PROTOCOL_HSR,
HSR_PROTOCOL_PRP,
HSR_PROTOCOL_MAX,
};
enum { enum {
IFLA_HSR_UNSPEC, IFLA_HSR_UNSPEC,
@ -913,6 +1117,9 @@ enum {
IFLA_HSR_SUPERVISION_ADDR, /* Supervision frame multicast addr */ IFLA_HSR_SUPERVISION_ADDR, /* Supervision frame multicast addr */
IFLA_HSR_SEQ_NR, IFLA_HSR_SEQ_NR,
IFLA_HSR_VERSION, /* HSR version */ IFLA_HSR_VERSION, /* HSR version */
IFLA_HSR_PROTOCOL, /* Indicate different protocol than
* HSR. For example PRP.
*/
__IFLA_HSR_MAX, __IFLA_HSR_MAX,
}; };
@ -1037,6 +1244,8 @@ enum {
#define RMNET_FLAGS_INGRESS_MAP_COMMANDS (1U << 1) #define RMNET_FLAGS_INGRESS_MAP_COMMANDS (1U << 1)
#define RMNET_FLAGS_INGRESS_MAP_CKSUMV4 (1U << 2) #define RMNET_FLAGS_INGRESS_MAP_CKSUMV4 (1U << 2)
#define RMNET_FLAGS_EGRESS_MAP_CKSUMV4 (1U << 3) #define RMNET_FLAGS_EGRESS_MAP_CKSUMV4 (1U << 3)
#define RMNET_FLAGS_INGRESS_MAP_CKSUMV5 (1U << 4)
#define RMNET_FLAGS_EGRESS_MAP_CKSUMV5 (1U << 5)
enum { enum {
IFLA_RMNET_UNSPEC, IFLA_RMNET_UNSPEC,
@ -1052,4 +1261,14 @@ struct ifla_rmnet_flags {
__u32 mask; __u32 mask;
}; };
/* MCTP section */
enum {
IFLA_MCTP_UNSPEC,
IFLA_MCTP_NET,
__IFLA_MCTP_MAX,
};
#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
#endif /* _LINUX_IF_LINK_H */ #endif /* _LINUX_IF_LINK_H */

View File

@ -2,6 +2,7 @@
#ifndef __LINUX_IF_PACKET_H #ifndef __LINUX_IF_PACKET_H
#define __LINUX_IF_PACKET_H #define __LINUX_IF_PACKET_H
#include <asm/byteorder.h>
#include <linux/types.h> #include <linux/types.h>
struct sockaddr_pkt { struct sockaddr_pkt {
@ -296,6 +297,17 @@ struct packet_mreq {
unsigned char mr_address[8]; unsigned char mr_address[8];
}; };
struct fanout_args {
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u16 id;
__u16 type_flags;
#else
__u16 type_flags;
__u16 id;
#endif
__u32 max_num_members;
};
#define PACKET_MR_MULTICAST 0 #define PACKET_MR_MULTICAST 0
#define PACKET_MR_PROMISC 1 #define PACKET_MR_PROMISC 1
#define PACKET_MR_ALLMULTI 2 #define PACKET_MR_ALLMULTI 2

View File

@ -123,6 +123,7 @@ struct in_addr {
#define IP_CHECKSUM 23 #define IP_CHECKSUM 23
#define IP_BIND_ADDRESS_NO_PORT 24 #define IP_BIND_ADDRESS_NO_PORT 24
#define IP_RECVFRAGSIZE 25 #define IP_RECVFRAGSIZE 25
#define IP_RECVERR_RFC4884 26
/* IP_MTU_DISCOVER values */ /* IP_MTU_DISCOVER values */
#define IP_PMTUDISC_DONT 0 /* Never send DF frames */ #define IP_PMTUDISC_DONT 0 /* Never send DF frames */
@ -134,7 +135,7 @@ struct in_addr {
* this socket to prevent accepting spoofed ones. * this socket to prevent accepting spoofed ones.
*/ */
#define IP_PMTUDISC_INTERFACE 4 #define IP_PMTUDISC_INTERFACE 4
/* weaker version of IP_PMTUDISC_INTERFACE, which allos packets to get /* weaker version of IP_PMTUDISC_INTERFACE, which allows packets to get
* fragmented if they exeed the interface mtu * fragmented if they exeed the interface mtu
*/ */
#define IP_PMTUDISC_OMIT 5 #define IP_PMTUDISC_OMIT 5
@ -187,11 +188,22 @@ struct ip_mreq_source {
}; };
struct ip_msfilter { struct ip_msfilter {
__be32 imsf_multiaddr; union {
__be32 imsf_interface; struct {
__u32 imsf_fmode; __be32 imsf_multiaddr_aux;
__u32 imsf_numsrc; __be32 imsf_interface_aux;
__be32 imsf_slist[1]; __u32 imsf_fmode_aux;
__u32 imsf_numsrc_aux;
__be32 imsf_slist[1];
};
struct {
__be32 imsf_multiaddr;
__be32 imsf_interface;
__u32 imsf_fmode;
__u32 imsf_numsrc;
__be32 imsf_slist_flex[];
};
};
}; };
#define IP_MSFILTER_SIZE(numsrc) \ #define IP_MSFILTER_SIZE(numsrc) \
@ -210,11 +222,22 @@ struct group_source_req {
}; };
struct group_filter { struct group_filter {
__u32 gf_interface; /* interface index */ union {
struct __kernel_sockaddr_storage gf_group; /* multicast address */ struct {
__u32 gf_fmode; /* filter mode */ __u32 gf_interface_aux; /* interface index */
__u32 gf_numsrc; /* number of sources */ struct __kernel_sockaddr_storage gf_group_aux; /* multicast address */
struct __kernel_sockaddr_storage gf_slist[1]; /* interface index */ __u32 gf_fmode_aux; /* filter mode */
__u32 gf_numsrc_aux; /* number of sources */
struct __kernel_sockaddr_storage gf_slist[1]; /* interface index */
};
struct {
__u32 gf_interface; /* interface index */
struct __kernel_sockaddr_storage gf_group; /* multicast address */
__u32 gf_fmode; /* filter mode */
__u32 gf_numsrc; /* number of sources */
struct __kernel_sockaddr_storage gf_slist_flex[]; /* interface index */
};
};
}; };
#define GROUP_FILTER_SIZE(numsrc) \ #define GROUP_FILTER_SIZE(numsrc) \
@ -288,6 +311,9 @@ struct sockaddr_in {
/* Address indicating an error return. */ /* Address indicating an error return. */
#define INADDR_NONE ((unsigned long int) 0xffffffff) #define INADDR_NONE ((unsigned long int) 0xffffffff)
/* Dummy address for src of ICMP replies if no real address is set (RFC7600). */
#define INADDR_DUMMY ((unsigned long int) 0xc0000008)
/* Network number for local host loopback. */ /* Network number for local host loopback. */
#define IN_LOOPBACKNET 127 #define IN_LOOPBACKNET 127

View File

@ -145,6 +145,7 @@ struct in6_flowlabel_req {
#define IPV6_TLV_PADN 1 #define IPV6_TLV_PADN 1
#define IPV6_TLV_ROUTERALERT 5 #define IPV6_TLV_ROUTERALERT 5
#define IPV6_TLV_CALIPSO 7 /* RFC 5570 */ #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
#define IPV6_TLV_IOAM 49 /* TEMPORARY IANA allocation for IOAM */
#define IPV6_TLV_JUMBO 194 #define IPV6_TLV_JUMBO 194
#define IPV6_TLV_HAO 201 /* home address option */ #define IPV6_TLV_HAO 201 /* home address option */
@ -179,6 +180,7 @@ struct in6_flowlabel_req {
#define IPV6_LEAVE_ANYCAST 28 #define IPV6_LEAVE_ANYCAST 28
#define IPV6_MULTICAST_ALL 29 #define IPV6_MULTICAST_ALL 29
#define IPV6_ROUTER_ALERT_ISOLATE 30 #define IPV6_ROUTER_ALERT_ISOLATE 30
#define IPV6_RECVERR_RFC4884 31
/* IPV6_MTU_DISCOVER values */ /* IPV6_MTU_DISCOVER values */
#define IPV6_PMTUDISC_DONT 0 #define IPV6_PMTUDISC_DONT 0

View File

@ -65,6 +65,7 @@ enum {
INET_DIAG_REQ_NONE, INET_DIAG_REQ_NONE,
INET_DIAG_REQ_BYTECODE, INET_DIAG_REQ_BYTECODE,
INET_DIAG_REQ_SK_BPF_STORAGES, INET_DIAG_REQ_SK_BPF_STORAGES,
INET_DIAG_REQ_PROTOCOL,
__INET_DIAG_REQ_MAX, __INET_DIAG_REQ_MAX,
}; };
@ -96,6 +97,7 @@ enum {
INET_DIAG_BC_MARK_COND, INET_DIAG_BC_MARK_COND,
INET_DIAG_BC_S_EQ, INET_DIAG_BC_S_EQ,
INET_DIAG_BC_D_EQ, INET_DIAG_BC_D_EQ,
INET_DIAG_BC_CGROUP_COND, /* u64 cgroup v2 ID */
}; };
struct inet_diag_hostcond { struct inet_diag_hostcond {
@ -157,6 +159,8 @@ enum {
INET_DIAG_MD5SIG, INET_DIAG_MD5SIG,
INET_DIAG_ULP_INFO, INET_DIAG_ULP_INFO,
INET_DIAG_SK_BPF_STORAGES, INET_DIAG_SK_BPF_STORAGES,
INET_DIAG_CGROUP_ID,
INET_DIAG_SOCKOPT,
__INET_DIAG_MAX, __INET_DIAG_MAX,
}; };
@ -180,6 +184,23 @@ struct inet_diag_meminfo {
__u32 idiag_tmem; __u32 idiag_tmem;
}; };
/* INET_DIAG_SOCKOPT */
struct inet_diag_sockopt {
__u8 recverr:1,
is_icsk:1,
freebind:1,
hdrincl:1,
mc_loop:1,
transparent:1,
mc_all:1,
nodefrag:1;
__u8 bind_address_no_port:1,
recverr_rfc4884:1,
defer_connect:1,
unused:5;
};
/* INET_DIAG_VEGASINFO */ /* INET_DIAG_VEGASINFO */
struct tcpvegas_info { struct tcpvegas_info {

133
include/uapi/linux/ioam6.h Normal file
View File

@ -0,0 +1,133 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_H
#define _LINUX_IOAM6_H
#include <asm/byteorder.h>
#include <linux/types.h>
#define IOAM6_U16_UNAVAILABLE U16_MAX
#define IOAM6_U32_UNAVAILABLE U32_MAX
#define IOAM6_U64_UNAVAILABLE U64_MAX
#define IOAM6_DEFAULT_ID (IOAM6_U32_UNAVAILABLE >> 8)
#define IOAM6_DEFAULT_ID_WIDE (IOAM6_U64_UNAVAILABLE >> 8)
#define IOAM6_DEFAULT_IF_ID IOAM6_U16_UNAVAILABLE
#define IOAM6_DEFAULT_IF_ID_WIDE IOAM6_U32_UNAVAILABLE
/*
* IPv6 IOAM Option Header
*/
struct ioam6_hdr {
__u8 opt_type;
__u8 opt_len;
__u8 :8; /* reserved */
#define IOAM6_TYPE_PREALLOC 0
__u8 type;
} __attribute__((packed));
/*
* IOAM Trace Header
*/
struct ioam6_trace_hdr {
__be16 namespace_id;
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 :1, /* unused */
:1, /* unused */
overflow:1,
nodelen:5;
__u8 remlen:7,
:1; /* unused */
union {
__be32 type_be32;
struct {
__u32 bit7:1,
bit6:1,
bit5:1,
bit4:1,
bit3:1,
bit2:1,
bit1:1,
bit0:1,
bit15:1, /* unused */
bit14:1, /* unused */
bit13:1, /* unused */
bit12:1, /* unused */
bit11:1,
bit10:1,
bit9:1,
bit8:1,
bit23:1, /* reserved */
bit22:1,
bit21:1, /* unused */
bit20:1, /* unused */
bit19:1, /* unused */
bit18:1, /* unused */
bit17:1, /* unused */
bit16:1, /* unused */
:8; /* reserved */
} type;
};
#elif defined(__BIG_ENDIAN_BITFIELD)
__u8 nodelen:5,
overflow:1,
:1, /* unused */
:1; /* unused */
__u8 :1, /* unused */
remlen:7;
union {
__be32 type_be32;
struct {
__u32 bit0:1,
bit1:1,
bit2:1,
bit3:1,
bit4:1,
bit5:1,
bit6:1,
bit7:1,
bit8:1,
bit9:1,
bit10:1,
bit11:1,
bit12:1, /* unused */
bit13:1, /* unused */
bit14:1, /* unused */
bit15:1, /* unused */
bit16:1, /* unused */
bit17:1, /* unused */
bit18:1, /* unused */
bit19:1, /* unused */
bit20:1, /* unused */
bit21:1, /* unused */
bit22:1,
bit23:1, /* reserved */
:8; /* reserved */
} type;
};
#else
#error "Please fix <asm/byteorder.h>"
#endif
#define IOAM6_TRACE_DATA_SIZE_MAX 244
__u8 data[0];
} __attribute__((packed));
#endif /* _LINUX_IOAM6_H */

View File

@ -0,0 +1,52 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM Generic Netlink API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_GENL_H
#define _LINUX_IOAM6_GENL_H
#define IOAM6_GENL_NAME "IOAM6"
#define IOAM6_GENL_VERSION 0x1
enum {
IOAM6_ATTR_UNSPEC,
IOAM6_ATTR_NS_ID, /* u16 */
IOAM6_ATTR_NS_DATA, /* u32 */
IOAM6_ATTR_NS_DATA_WIDE,/* u64 */
#define IOAM6_MAX_SCHEMA_DATA_LEN (255 * 4)
IOAM6_ATTR_SC_ID, /* u32 */
IOAM6_ATTR_SC_DATA, /* Binary */
IOAM6_ATTR_SC_NONE, /* Flag */
IOAM6_ATTR_PAD,
__IOAM6_ATTR_MAX,
};
#define IOAM6_ATTR_MAX (__IOAM6_ATTR_MAX - 1)
enum {
IOAM6_CMD_UNSPEC,
IOAM6_CMD_ADD_NAMESPACE,
IOAM6_CMD_DEL_NAMESPACE,
IOAM6_CMD_DUMP_NAMESPACES,
IOAM6_CMD_ADD_SCHEMA,
IOAM6_CMD_DEL_SCHEMA,
IOAM6_CMD_DUMP_SCHEMAS,
IOAM6_CMD_NS_SET_SCHEMA,
__IOAM6_CMD_MAX,
};
#define IOAM6_CMD_MAX (__IOAM6_CMD_MAX - 1)
#endif /* _LINUX_IOAM6_GENL_H */

View File

@ -0,0 +1,49 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM Lightweight Tunnel API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_IPTUNNEL_H
#define _LINUX_IOAM6_IPTUNNEL_H
/* Encap modes:
* - inline: direct insertion
* - encap: ip6ip6 encapsulation
* - auto: __inline__ for local packets, encap for in-transit packets
*/
enum {
__IOAM6_IPTUNNEL_MODE_MIN,
IOAM6_IPTUNNEL_MODE_INLINE,
IOAM6_IPTUNNEL_MODE_ENCAP,
IOAM6_IPTUNNEL_MODE_AUTO,
__IOAM6_IPTUNNEL_MODE_MAX,
};
#define IOAM6_IPTUNNEL_MODE_MIN (__IOAM6_IPTUNNEL_MODE_MIN + 1)
#define IOAM6_IPTUNNEL_MODE_MAX (__IOAM6_IPTUNNEL_MODE_MAX - 1)
enum {
IOAM6_IPTUNNEL_UNSPEC,
/* Encap mode */
IOAM6_IPTUNNEL_MODE, /* u8 */
/* Tunnel dst address.
* For encap,auto modes.
*/
IOAM6_IPTUNNEL_DST, /* struct in6_addr */
/* IOAM Trace Header */
IOAM6_IPTUNNEL_TRACE, /* struct ioam6_trace_hdr */
__IOAM6_IPTUNNEL_MAX,
};
#define IOAM6_IPTUNNEL_MAX (__IOAM6_IPTUNNEL_MAX - 1)
#endif /* _LINUX_IOAM6_IPTUNNEL_H */

View File

@ -169,6 +169,7 @@ enum
IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST, IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
IPV4_DEVCONF_DROP_GRATUITOUS_ARP, IPV4_DEVCONF_DROP_GRATUITOUS_ARP,
IPV4_DEVCONF_BC_FORWARDING, IPV4_DEVCONF_BC_FORWARDING,
IPV4_DEVCONF_ARP_EVICT_NOCARRIER,
__IPV4_DEVCONF_MAX __IPV4_DEVCONF_MAX
}; };

View File

@ -3,13 +3,6 @@
#define _LINUX_KERNEL_H #define _LINUX_KERNEL_H
#include <linux/sysinfo.h> #include <linux/sysinfo.h>
#include <linux/const.h>
/*
* 'kernel.h' contains some often-used function prototypes etc
*/
#define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
#endif /* _LINUX_KERNEL_H */ #endif /* _LINUX_KERNEL_H */

View File

@ -108,7 +108,7 @@ enum {
L2TP_ATTR_VLAN_ID, /* u16 (not used) */ L2TP_ATTR_VLAN_ID, /* u16 (not used) */
L2TP_ATTR_COOKIE, /* 0, 4 or 8 bytes */ L2TP_ATTR_COOKIE, /* 0, 4 or 8 bytes */
L2TP_ATTR_PEER_COOKIE, /* 0, 4 or 8 bytes */ L2TP_ATTR_PEER_COOKIE, /* 0, 4 or 8 bytes */
L2TP_ATTR_DEBUG, /* u32, enum l2tp_debug_flags */ L2TP_ATTR_DEBUG, /* u32, enum l2tp_debug_flags (not used) */
L2TP_ATTR_RECV_SEQ, /* u8 */ L2TP_ATTR_RECV_SEQ, /* u8 */
L2TP_ATTR_SEND_SEQ, /* u8 */ L2TP_ATTR_SEND_SEQ, /* u8 */
L2TP_ATTR_LNS_MODE, /* u8 */ L2TP_ATTR_LNS_MODE, /* u8 */
@ -144,6 +144,8 @@ enum {
L2TP_ATTR_RX_OOS_PACKETS, /* u64 */ L2TP_ATTR_RX_OOS_PACKETS, /* u64 */
L2TP_ATTR_RX_ERRORS, /* u64 */ L2TP_ATTR_RX_ERRORS, /* u64 */
L2TP_ATTR_STATS_PAD, L2TP_ATTR_STATS_PAD,
L2TP_ATTR_RX_COOKIE_DISCARDS, /* u64 */
L2TP_ATTR_RX_INVALID, /* u64 */
__L2TP_ATTR_STATS_MAX, __L2TP_ATTR_STATS_MAX,
}; };
@ -177,7 +179,9 @@ enum l2tp_seqmode {
}; };
/** /**
* enum l2tp_debug_flags - debug message categories for L2TP tunnels/sessions * enum l2tp_debug_flags - debug message categories for L2TP tunnels/sessions.
*
* Unused.
* *
* @L2TP_MSG_DEBUG: verbose debug (if compiled in) * @L2TP_MSG_DEBUG: verbose debug (if compiled in)
* @L2TP_MSG_CONTROL: userspace - kernel interface * @L2TP_MSG_CONTROL: userspace - kernel interface

View File

@ -14,6 +14,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_BPF, LWTUNNEL_ENCAP_BPF,
LWTUNNEL_ENCAP_SEG6_LOCAL, LWTUNNEL_ENCAP_SEG6_LOCAL,
LWTUNNEL_ENCAP_RPL, LWTUNNEL_ENCAP_RPL,
LWTUNNEL_ENCAP_IOAM6,
__LWTUNNEL_ENCAP_MAX, __LWTUNNEL_ENCAP_MAX,
}; };

View File

@ -94,7 +94,9 @@
#define BALLOON_KVM_MAGIC 0x13661366 #define BALLOON_KVM_MAGIC 0x13661366
#define ZSMALLOC_MAGIC 0x58295829 #define ZSMALLOC_MAGIC 0x58295829
#define DMA_BUF_MAGIC 0x444d4142 /* "DMAB" */ #define DMA_BUF_MAGIC 0x444d4142 /* "DMAB" */
#define DEVMEM_MAGIC 0x454d444d /* "DMEM" */
#define Z3FOLD_MAGIC 0x33 #define Z3FOLD_MAGIC 0x33
#define PPC_CMM_MAGIC 0xc7571590 #define PPC_CMM_MAGIC 0xc7571590
#define SECRETMEM_MAGIC 0x5345434d /* "SECM" */
#endif /* __LINUX_MAGIC_H__ */ #endif /* __LINUX_MAGIC_H__ */

229
include/uapi/linux/mptcp.h Normal file
View File

@ -0,0 +1,229 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
#ifndef _MPTCP_H
#define _MPTCP_H
#include <linux/const.h>
#include <linux/types.h>
#include <linux/in.h> /* for sockaddr_in */
#include <linux/in6.h> /* for sockaddr_in6 */
#include <linux/socket.h> /* for sockaddr_storage and sa_family */
#include <sys/socket.h> /* for struct sockaddr */
#define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
#define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
#define MPTCP_SUBFLOW_FLAG_JOIN_REM _BITUL(2)
#define MPTCP_SUBFLOW_FLAG_JOIN_LOC _BITUL(3)
#define MPTCP_SUBFLOW_FLAG_BKUP_REM _BITUL(4)
#define MPTCP_SUBFLOW_FLAG_BKUP_LOC _BITUL(5)
#define MPTCP_SUBFLOW_FLAG_FULLY_ESTABLISHED _BITUL(6)
#define MPTCP_SUBFLOW_FLAG_CONNECTED _BITUL(7)
#define MPTCP_SUBFLOW_FLAG_MAPVALID _BITUL(8)
enum {
MPTCP_SUBFLOW_ATTR_UNSPEC,
MPTCP_SUBFLOW_ATTR_TOKEN_REM,
MPTCP_SUBFLOW_ATTR_TOKEN_LOC,
MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ,
MPTCP_SUBFLOW_ATTR_MAP_SEQ,
MPTCP_SUBFLOW_ATTR_MAP_SFSEQ,
MPTCP_SUBFLOW_ATTR_SSN_OFFSET,
MPTCP_SUBFLOW_ATTR_MAP_DATALEN,
MPTCP_SUBFLOW_ATTR_FLAGS,
MPTCP_SUBFLOW_ATTR_ID_REM,
MPTCP_SUBFLOW_ATTR_ID_LOC,
MPTCP_SUBFLOW_ATTR_PAD,
__MPTCP_SUBFLOW_ATTR_MAX
};
#define MPTCP_SUBFLOW_ATTR_MAX (__MPTCP_SUBFLOW_ATTR_MAX - 1)
/* netlink interface */
#define MPTCP_PM_NAME "mptcp_pm"
#define MPTCP_PM_CMD_GRP_NAME "mptcp_pm_cmds"
#define MPTCP_PM_EV_GRP_NAME "mptcp_pm_events"
#define MPTCP_PM_VER 0x1
/*
* ATTR types defined for MPTCP
*/
enum {
MPTCP_PM_ATTR_UNSPEC,
MPTCP_PM_ATTR_ADDR, /* nested address */
MPTCP_PM_ATTR_RCV_ADD_ADDRS, /* u32 */
MPTCP_PM_ATTR_SUBFLOWS, /* u32 */
__MPTCP_PM_ATTR_MAX
};
#define MPTCP_PM_ATTR_MAX (__MPTCP_PM_ATTR_MAX - 1)
enum {
MPTCP_PM_ADDR_ATTR_UNSPEC,
MPTCP_PM_ADDR_ATTR_FAMILY, /* u16 */
MPTCP_PM_ADDR_ATTR_ID, /* u8 */
MPTCP_PM_ADDR_ATTR_ADDR4, /* struct in_addr */
MPTCP_PM_ADDR_ATTR_ADDR6, /* struct in6_addr */
MPTCP_PM_ADDR_ATTR_PORT, /* u16 */
MPTCP_PM_ADDR_ATTR_FLAGS, /* u32 */
MPTCP_PM_ADDR_ATTR_IF_IDX, /* s32 */
__MPTCP_PM_ADDR_ATTR_MAX
};
#define MPTCP_PM_ADDR_ATTR_MAX (__MPTCP_PM_ADDR_ATTR_MAX - 1)
#define MPTCP_PM_ADDR_FLAG_SIGNAL (1 << 0)
#define MPTCP_PM_ADDR_FLAG_SUBFLOW (1 << 1)
#define MPTCP_PM_ADDR_FLAG_BACKUP (1 << 2)
#define MPTCP_PM_ADDR_FLAG_FULLMESH (1 << 3)
enum {
MPTCP_PM_CMD_UNSPEC,
MPTCP_PM_CMD_ADD_ADDR,
MPTCP_PM_CMD_DEL_ADDR,
MPTCP_PM_CMD_GET_ADDR,
MPTCP_PM_CMD_FLUSH_ADDRS,
MPTCP_PM_CMD_SET_LIMITS,
MPTCP_PM_CMD_GET_LIMITS,
MPTCP_PM_CMD_SET_FLAGS,
__MPTCP_PM_CMD_AFTER_LAST
};
#define MPTCP_INFO_FLAG_FALLBACK _BITUL(0)
#define MPTCP_INFO_FLAG_REMOTE_KEY_RECEIVED _BITUL(1)
struct mptcp_info {
__u8 mptcpi_subflows;
__u8 mptcpi_add_addr_signal;
__u8 mptcpi_add_addr_accepted;
__u8 mptcpi_subflows_max;
__u8 mptcpi_add_addr_signal_max;
__u8 mptcpi_add_addr_accepted_max;
__u32 mptcpi_flags;
__u32 mptcpi_token;
__u64 mptcpi_write_seq;
__u64 mptcpi_snd_una;
__u64 mptcpi_rcv_nxt;
__u8 mptcpi_local_addr_used;
__u8 mptcpi_local_addr_max;
__u8 mptcpi_csum_enabled;
};
/*
* MPTCP_EVENT_CREATED: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport
* A new MPTCP connection has been created. It is the good time to allocate
* memory and send ADD_ADDR if needed. Depending on the traffic-patterns
* it can take a long time until the MPTCP_EVENT_ESTABLISHED is sent.
*
* MPTCP_EVENT_ESTABLISHED: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport
* A MPTCP connection is established (can start new subflows).
*
* MPTCP_EVENT_CLOSED: token
* A MPTCP connection has stopped.
*
* MPTCP_EVENT_ANNOUNCED: token, rem_id, family, daddr4 | daddr6 [, dport]
* A new address has been announced by the peer.
*
* MPTCP_EVENT_REMOVED: token, rem_id
* An address has been lost by the peer.
*
* MPTCP_EVENT_SUB_ESTABLISHED: token, family, saddr4 | saddr6,
* daddr4 | daddr6, sport, dport, backup,
* if_idx [, error]
* A new subflow has been established. 'error' should not be set.
*
* MPTCP_EVENT_SUB_CLOSED: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport, backup, if_idx [, error]
* A subflow has been closed. An error (copy of sk_err) could be set if an
* error has been detected for this subflow.
*
* MPTCP_EVENT_SUB_PRIORITY: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport, backup, if_idx [, error]
* The priority of a subflow has changed. 'error' should not be set.
*/
enum mptcp_event_type {
MPTCP_EVENT_UNSPEC = 0,
MPTCP_EVENT_CREATED = 1,
MPTCP_EVENT_ESTABLISHED = 2,
MPTCP_EVENT_CLOSED = 3,
MPTCP_EVENT_ANNOUNCED = 6,
MPTCP_EVENT_REMOVED = 7,
MPTCP_EVENT_SUB_ESTABLISHED = 10,
MPTCP_EVENT_SUB_CLOSED = 11,
MPTCP_EVENT_SUB_PRIORITY = 13,
};
enum mptcp_event_attr {
MPTCP_ATTR_UNSPEC = 0,
MPTCP_ATTR_TOKEN, /* u32 */
MPTCP_ATTR_FAMILY, /* u16 */
MPTCP_ATTR_LOC_ID, /* u8 */
MPTCP_ATTR_REM_ID, /* u8 */
MPTCP_ATTR_SADDR4, /* be32 */
MPTCP_ATTR_SADDR6, /* struct in6_addr */
MPTCP_ATTR_DADDR4, /* be32 */
MPTCP_ATTR_DADDR6, /* struct in6_addr */
MPTCP_ATTR_SPORT, /* be16 */
MPTCP_ATTR_DPORT, /* be16 */
MPTCP_ATTR_BACKUP, /* u8 */
MPTCP_ATTR_ERROR, /* u8 */
MPTCP_ATTR_FLAGS, /* u16 */
MPTCP_ATTR_TIMEOUT, /* u32 */
MPTCP_ATTR_IF_IDX, /* s32 */
MPTCP_ATTR_RESET_REASON,/* u32 */
MPTCP_ATTR_RESET_FLAGS, /* u32 */
__MPTCP_ATTR_AFTER_LAST
};
#define MPTCP_ATTR_MAX (__MPTCP_ATTR_AFTER_LAST - 1)
/* MPTCP Reset reason codes, rfc8684 */
#define MPTCP_RST_EUNSPEC 0
#define MPTCP_RST_EMPTCP 1
#define MPTCP_RST_ERESOURCE 2
#define MPTCP_RST_EPROHIBIT 3
#define MPTCP_RST_EWQ2BIG 4
#define MPTCP_RST_EBADPERF 5
#define MPTCP_RST_EMIDDLEBOX 6
struct mptcp_subflow_data {
__u32 size_subflow_data; /* size of this structure in userspace */
__u32 num_subflows; /* must be 0, set by kernel */
__u32 size_kernel; /* must be 0, set by kernel */
__u32 size_user; /* size of one element in data[] */
} __attribute__((aligned(8)));
struct mptcp_subflow_addrs {
union {
__kernel_sa_family_t sa_family;
struct sockaddr sa_local;
struct sockaddr_in sin_local;
struct sockaddr_in6 sin6_local;
struct __kernel_sockaddr_storage ss_local;
};
union {
struct sockaddr sa_remote;
struct sockaddr_in sin_remote;
struct sockaddr_in6 sin6_remote;
struct __kernel_sockaddr_storage ss_remote;
};
};
/* MPTCP socket options */
#define MPTCP_INFO 1
#define MPTCP_TCPINFO 2
#define MPTCP_SUBFLOW_ADDRS 3
#endif /* _MPTCP_H */

View File

@ -29,6 +29,9 @@ enum {
NDA_LINK_NETNSID, NDA_LINK_NETNSID,
NDA_SRC_VNI, NDA_SRC_VNI,
NDA_PROTOCOL, /* Originator of entry */ NDA_PROTOCOL, /* Originator of entry */
NDA_NH_ID,
NDA_FDB_EXT_ATTRS,
NDA_FLAGS_EXT,
__NDA_MAX __NDA_MAX
}; };
@ -38,14 +41,16 @@ enum {
* Neighbor Cache Entry Flags * Neighbor Cache Entry Flags
*/ */
#define NTF_USE 0x01 #define NTF_USE (1 << 0)
#define NTF_SELF 0x02 #define NTF_SELF (1 << 1)
#define NTF_MASTER 0x04 #define NTF_MASTER (1 << 2)
#define NTF_PROXY 0x08 /* == ATF_PUBL */ #define NTF_PROXY (1 << 3) /* == ATF_PUBL */
#define NTF_EXT_LEARNED 0x10 #define NTF_EXT_LEARNED (1 << 4)
#define NTF_OFFLOADED 0x20 #define NTF_OFFLOADED (1 << 5)
#define NTF_STICKY 0x40 #define NTF_STICKY (1 << 6)
#define NTF_ROUTER 0x80 #define NTF_ROUTER (1 << 7)
/* Extended flags under NDA_FLAGS_EXT: */
#define NTF_EXT_MANAGED (1 << 0)
/* /*
* Neighbor Cache Entry States. * Neighbor Cache Entry States.
@ -63,9 +68,22 @@ enum {
#define NUD_PERMANENT 0x80 #define NUD_PERMANENT 0x80
#define NUD_NONE 0x00 #define NUD_NONE 0x00
/* NUD_NOARP & NUD_PERMANENT are pseudostates, they never change /* NUD_NOARP & NUD_PERMANENT are pseudostates, they never change and make no
and make no address resolution or NUD. * address resolution or NUD.
NUD_PERMANENT also cannot be deleted by garbage collectors. *
* NUD_PERMANENT also cannot be deleted by garbage collectors. This holds true
* for dynamic entries with NTF_EXT_LEARNED flag as well. However, upon carrier
* down event, NUD_PERMANENT entries are not flushed whereas NTF_EXT_LEARNED
* flagged entries explicitly are (which is also consistent with the routing
* subsystem).
*
* When NTF_EXT_LEARNED is set for a bridge fdb entry the different cache entry
* states don't make sense and thus are ignored. Such entries don't age and
* can roam.
*
* NTF_EXT_MANAGED flagged neigbor entries are managed by the kernel on behalf
* of a user space control plane, and automatically refreshed so that (if
* possible) they remain in NUD_REACHABLE state.
*/ */
struct nda_cacheinfo { struct nda_cacheinfo {
@ -171,4 +189,27 @@ enum {
}; };
#define NDTA_MAX (__NDTA_MAX - 1) #define NDTA_MAX (__NDTA_MAX - 1)
/* FDB activity notification bits used in NFEA_ACTIVITY_NOTIFY:
* - FDB_NOTIFY_BIT - notify on activity/expire for any entry
* - FDB_NOTIFY_INACTIVE_BIT - mark as inactive to avoid multiple notifications
*/
enum {
FDB_NOTIFY_BIT = (1 << 0),
FDB_NOTIFY_INACTIVE_BIT = (1 << 1)
};
/* embedded into NDA_FDB_EXT_ATTRS:
* [NDA_FDB_EXT_ATTRS] = {
* [NFEA_ACTIVITY_NOTIFY]
* ...
* }
*/
enum {
NFEA_UNSPEC,
NFEA_ACTIVITY_NOTIFY,
NFEA_DONT_REFRESH,
__NFEA_MAX
};
#define NFEA_MAX (__NFEA_MAX - 1)
#endif #endif

View File

@ -43,11 +43,13 @@ enum nf_inet_hooks {
NF_INET_FORWARD, NF_INET_FORWARD,
NF_INET_LOCAL_OUT, NF_INET_LOCAL_OUT,
NF_INET_POST_ROUTING, NF_INET_POST_ROUTING,
NF_INET_NUMHOOKS NF_INET_NUMHOOKS,
NF_INET_INGRESS = NF_INET_NUMHOOKS,
}; };
enum nf_dev_hooks { enum nf_dev_hooks {
NF_NETDEV_INGRESS, NF_NETDEV_INGRESS,
NF_NETDEV_EGRESS,
NF_NETDEV_NUMHOOKS NF_NETDEV_NUMHOOKS
}; };

View File

@ -92,11 +92,11 @@ enum {
/* Reserve empty slots */ /* Reserve empty slots */
IPSET_ATTR_CADT_MAX = 16, IPSET_ATTR_CADT_MAX = 16,
/* Create-only specific attributes */ /* Create-only specific attributes */
IPSET_ATTR_GC, IPSET_ATTR_INITVAL, /* was unused IPSET_ATTR_GC */
IPSET_ATTR_HASHSIZE, IPSET_ATTR_HASHSIZE,
IPSET_ATTR_MAXELEM, IPSET_ATTR_MAXELEM,
IPSET_ATTR_NETMASK, IPSET_ATTR_NETMASK,
IPSET_ATTR_PROBES, IPSET_ATTR_BUCKETSIZE, /* was unused IPSET_ATTR_PROBES */
IPSET_ATTR_RESIZE, IPSET_ATTR_RESIZE,
IPSET_ATTR_SIZE, IPSET_ATTR_SIZE,
/* Kernel-only */ /* Kernel-only */
@ -214,6 +214,8 @@ enum ipset_cadt_flags {
enum ipset_create_flags { enum ipset_create_flags {
IPSET_CREATE_FLAG_BIT_FORCEADD = 0, IPSET_CREATE_FLAG_BIT_FORCEADD = 0,
IPSET_CREATE_FLAG_FORCEADD = (1 << IPSET_CREATE_FLAG_BIT_FORCEADD), IPSET_CREATE_FLAG_FORCEADD = (1 << IPSET_CREATE_FLAG_BIT_FORCEADD),
IPSET_CREATE_FLAG_BIT_BUCKETSIZE = 1,
IPSET_CREATE_FLAG_BUCKETSIZE = (1 << IPSET_CREATE_FLAG_BIT_BUCKETSIZE),
IPSET_CREATE_FLAG_BIT_MAX = 7, IPSET_CREATE_FLAG_BIT_MAX = 7,
}; };

View File

@ -1,7 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _X_TABLES_H #ifndef _X_TABLES_H
#define _X_TABLES_H #define _X_TABLES_H
#include <linux/kernel.h> #include <linux/const.h>
#include <linux/types.h> #include <linux/types.h>
#define XT_FUNCTION_MAXNAMELEN 30 #define XT_FUNCTION_MAXNAMELEN 30

View File

@ -2,7 +2,7 @@
#ifndef __LINUX_NETLINK_H #ifndef __LINUX_NETLINK_H
#define __LINUX_NETLINK_H #define __LINUX_NETLINK_H
#include <linux/kernel.h> #include <linux/const.h>
#include <linux/socket.h> /* for __kernel_sa_family_t */ #include <linux/socket.h> /* for __kernel_sa_family_t */
#include <linux/types.h> #include <linux/types.h>
@ -91,9 +91,10 @@ struct nlmsghdr {
#define NLMSG_HDRLEN ((int) NLMSG_ALIGN(sizeof(struct nlmsghdr))) #define NLMSG_HDRLEN ((int) NLMSG_ALIGN(sizeof(struct nlmsghdr)))
#define NLMSG_LENGTH(len) ((len) + NLMSG_HDRLEN) #define NLMSG_LENGTH(len) ((len) + NLMSG_HDRLEN)
#define NLMSG_SPACE(len) NLMSG_ALIGN(NLMSG_LENGTH(len)) #define NLMSG_SPACE(len) NLMSG_ALIGN(NLMSG_LENGTH(len))
#define NLMSG_DATA(nlh) ((void*)(((char*)nlh) + NLMSG_LENGTH(0))) #define NLMSG_DATA(nlh) ((void *)(((char *)nlh) + NLMSG_HDRLEN))
#define NLMSG_NEXT(nlh,len) ((len) -= NLMSG_ALIGN((nlh)->nlmsg_len), \ #define NLMSG_NEXT(nlh,len) ((len) -= NLMSG_ALIGN((nlh)->nlmsg_len), \
(struct nlmsghdr*)(((char*)(nlh)) + NLMSG_ALIGN((nlh)->nlmsg_len))) (struct nlmsghdr *)(((char *)(nlh)) + \
NLMSG_ALIGN((nlh)->nlmsg_len)))
#define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \ #define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \
(nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \ (nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \
(nlh)->nlmsg_len <= (len)) (nlh)->nlmsg_len <= (len))
@ -129,6 +130,7 @@ struct nlmsgerr {
* @NLMSGERR_ATTR_COOKIE: arbitrary subsystem specific cookie to * @NLMSGERR_ATTR_COOKIE: arbitrary subsystem specific cookie to
* be used - in the success case - to identify a created * be used - in the success case - to identify a created
* object or operation or similar (binary) * object or operation or similar (binary)
* @NLMSGERR_ATTR_POLICY: policy for a rejected attribute
* @__NLMSGERR_ATTR_MAX: number of attributes * @__NLMSGERR_ATTR_MAX: number of attributes
* @NLMSGERR_ATTR_MAX: highest attribute number * @NLMSGERR_ATTR_MAX: highest attribute number
*/ */
@ -137,6 +139,7 @@ enum nlmsgerr_attrs {
NLMSGERR_ATTR_MSG, NLMSGERR_ATTR_MSG,
NLMSGERR_ATTR_OFFS, NLMSGERR_ATTR_OFFS,
NLMSGERR_ATTR_COOKIE, NLMSGERR_ATTR_COOKIE,
NLMSGERR_ATTR_POLICY,
__NLMSGERR_ATTR_MAX, __NLMSGERR_ATTR_MAX,
NLMSGERR_ATTR_MAX = __NLMSGERR_ATTR_MAX - 1 NLMSGERR_ATTR_MAX = __NLMSGERR_ATTR_MAX - 1
@ -245,4 +248,109 @@ struct nla_bitfield32 {
__u32 selector; __u32 selector;
}; };
/*
* policy descriptions - it's specific to each family how this is used
* Normally, it should be retrieved via a dump inside another attribute
* specifying where it applies.
*/
/**
* enum netlink_attribute_type - type of an attribute
* @NL_ATTR_TYPE_INVALID: unused
* @NL_ATTR_TYPE_FLAG: flag attribute (present/not present)
* @NL_ATTR_TYPE_U8: 8-bit unsigned attribute
* @NL_ATTR_TYPE_U16: 16-bit unsigned attribute
* @NL_ATTR_TYPE_U32: 32-bit unsigned attribute
* @NL_ATTR_TYPE_U64: 64-bit unsigned attribute
* @NL_ATTR_TYPE_S8: 8-bit signed attribute
* @NL_ATTR_TYPE_S16: 16-bit signed attribute
* @NL_ATTR_TYPE_S32: 32-bit signed attribute
* @NL_ATTR_TYPE_S64: 64-bit signed attribute
* @NL_ATTR_TYPE_BINARY: binary data, min/max length may be specified
* @NL_ATTR_TYPE_STRING: string, min/max length may be specified
* @NL_ATTR_TYPE_NUL_STRING: NUL-terminated string,
* min/max length may be specified
* @NL_ATTR_TYPE_NESTED: nested, i.e. the content of this attribute
* consists of sub-attributes. The nested policy and maxtype
* inside may be specified.
* @NL_ATTR_TYPE_NESTED_ARRAY: nested array, i.e. the content of this
* attribute contains sub-attributes whose type is irrelevant
* (just used to separate the array entries) and each such array
* entry has attributes again, the policy for those inner ones
* and the corresponding maxtype may be specified.
* @NL_ATTR_TYPE_BITFIELD32: &struct nla_bitfield32 attribute
*/
enum netlink_attribute_type {
NL_ATTR_TYPE_INVALID,
NL_ATTR_TYPE_FLAG,
NL_ATTR_TYPE_U8,
NL_ATTR_TYPE_U16,
NL_ATTR_TYPE_U32,
NL_ATTR_TYPE_U64,
NL_ATTR_TYPE_S8,
NL_ATTR_TYPE_S16,
NL_ATTR_TYPE_S32,
NL_ATTR_TYPE_S64,
NL_ATTR_TYPE_BINARY,
NL_ATTR_TYPE_STRING,
NL_ATTR_TYPE_NUL_STRING,
NL_ATTR_TYPE_NESTED,
NL_ATTR_TYPE_NESTED_ARRAY,
NL_ATTR_TYPE_BITFIELD32,
};
/**
* enum netlink_policy_type_attr - policy type attributes
* @NL_POLICY_TYPE_ATTR_UNSPEC: unused
* @NL_POLICY_TYPE_ATTR_TYPE: type of the attribute,
* &enum netlink_attribute_type (U32)
* @NL_POLICY_TYPE_ATTR_MIN_VALUE_S: minimum value for signed
* integers (S64)
* @NL_POLICY_TYPE_ATTR_MAX_VALUE_S: maximum value for signed
* integers (S64)
* @NL_POLICY_TYPE_ATTR_MIN_VALUE_U: minimum value for unsigned
* integers (U64)
* @NL_POLICY_TYPE_ATTR_MAX_VALUE_U: maximum value for unsigned
* integers (U64)
* @NL_POLICY_TYPE_ATTR_MIN_LENGTH: minimum length for binary
* attributes, no minimum if not given (U32)
* @NL_POLICY_TYPE_ATTR_MAX_LENGTH: maximum length for binary
* attributes, no maximum if not given (U32)
* @NL_POLICY_TYPE_ATTR_POLICY_IDX: sub policy for nested and
* nested array types (U32)
* @NL_POLICY_TYPE_ATTR_POLICY_MAXTYPE: maximum sub policy
* attribute for nested and nested array types, this can
* in theory be < the size of the policy pointed to by
* the index, if limited inside the nesting (U32)
* @NL_POLICY_TYPE_ATTR_BITFIELD32_MASK: valid mask for the
* bitfield32 type (U32)
* @NL_POLICY_TYPE_ATTR_MASK: mask of valid bits for unsigned integers (U64)
* @NL_POLICY_TYPE_ATTR_PAD: pad attribute for 64-bit alignment
*/
enum netlink_policy_type_attr {
NL_POLICY_TYPE_ATTR_UNSPEC,
NL_POLICY_TYPE_ATTR_TYPE,
NL_POLICY_TYPE_ATTR_MIN_VALUE_S,
NL_POLICY_TYPE_ATTR_MAX_VALUE_S,
NL_POLICY_TYPE_ATTR_MIN_VALUE_U,
NL_POLICY_TYPE_ATTR_MAX_VALUE_U,
NL_POLICY_TYPE_ATTR_MIN_LENGTH,
NL_POLICY_TYPE_ATTR_MAX_LENGTH,
NL_POLICY_TYPE_ATTR_POLICY_IDX,
NL_POLICY_TYPE_ATTR_POLICY_MAXTYPE,
NL_POLICY_TYPE_ATTR_BITFIELD32_MASK,
NL_POLICY_TYPE_ATTR_PAD,
NL_POLICY_TYPE_ATTR_MASK,
/* keep last */
__NL_POLICY_TYPE_ATTR_MAX,
NL_POLICY_TYPE_ATTR_MAX = __NL_POLICY_TYPE_ATTR_MAX - 1
};
#endif /* __LINUX_NETLINK_H */ #endif /* __LINUX_NETLINK_H */

View File

@ -21,7 +21,10 @@ struct nexthop_grp {
}; };
enum { enum {
NEXTHOP_GRP_TYPE_MPATH, /* default type if not specified */ NEXTHOP_GRP_TYPE_MPATH, /* hash-threshold nexthop group
* default type if not specified
*/
NEXTHOP_GRP_TYPE_RES, /* resilient nexthop group */
__NEXTHOP_GRP_TYPE_MAX, __NEXTHOP_GRP_TYPE_MAX,
}; };
@ -49,8 +52,53 @@ enum {
NHA_GROUPS, /* flag; only return nexthop groups in dump */ NHA_GROUPS, /* flag; only return nexthop groups in dump */
NHA_MASTER, /* u32; only return nexthops with given master dev */ NHA_MASTER, /* u32; only return nexthops with given master dev */
NHA_FDB, /* flag; nexthop belongs to a bridge fdb */
/* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */
/* nested; resilient nexthop group attributes */
NHA_RES_GROUP,
/* nested; nexthop bucket attributes */
NHA_RES_BUCKET,
__NHA_MAX, __NHA_MAX,
}; };
#define NHA_MAX (__NHA_MAX - 1) #define NHA_MAX (__NHA_MAX - 1)
enum {
NHA_RES_GROUP_UNSPEC,
/* Pad attribute for 64-bit alignment. */
NHA_RES_GROUP_PAD = NHA_RES_GROUP_UNSPEC,
/* u16; number of nexthop buckets in a resilient nexthop group */
NHA_RES_GROUP_BUCKETS,
/* clock_t as u32; nexthop bucket idle timer (per-group) */
NHA_RES_GROUP_IDLE_TIMER,
/* clock_t as u32; nexthop unbalanced timer */
NHA_RES_GROUP_UNBALANCED_TIMER,
/* clock_t as u64; nexthop unbalanced time */
NHA_RES_GROUP_UNBALANCED_TIME,
__NHA_RES_GROUP_MAX,
};
#define NHA_RES_GROUP_MAX (__NHA_RES_GROUP_MAX - 1)
enum {
NHA_RES_BUCKET_UNSPEC,
/* Pad attribute for 64-bit alignment. */
NHA_RES_BUCKET_PAD = NHA_RES_BUCKET_UNSPEC,
/* u16; nexthop bucket index */
NHA_RES_BUCKET_INDEX,
/* clock_t as u64; nexthop bucket idle time */
NHA_RES_BUCKET_IDLE_TIME,
/* u32; nexthop id assigned to the nexthop bucket */
NHA_RES_BUCKET_NH_ID,
__NHA_RES_BUCKET_MAX,
};
#define NHA_RES_BUCKET_MAX (__NHA_RES_BUCKET_MAX - 1)
#endif #endif

View File

@ -22,6 +22,7 @@ enum {
__TCA_ACT_MAX __TCA_ACT_MAX
}; };
/* See other TCA_ACT_FLAGS_ * flags in include/net/act_api.h. */
#define TCA_ACT_FLAGS_NO_PERCPU_STATS 1 /* Don't use percpu allocator for #define TCA_ACT_FLAGS_NO_PERCPU_STATS 1 /* Don't use percpu allocator for
* actions stats. * actions stats.
*/ */
@ -134,6 +135,7 @@ enum tca_id {
TCA_ID_CTINFO, TCA_ID_CTINFO,
TCA_ID_MPLS, TCA_ID_MPLS,
TCA_ID_CT, TCA_ID_CT,
TCA_ID_GATE,
/* other actions go here */ /* other actions go here */
__TCA_ID_MAX = 255 __TCA_ID_MAX = 255
}; };
@ -189,6 +191,8 @@ enum {
TCA_POLICE_PAD, TCA_POLICE_PAD,
TCA_POLICE_RATE64, TCA_POLICE_RATE64,
TCA_POLICE_PEAKRATE64, TCA_POLICE_PEAKRATE64,
TCA_POLICE_PKTRATE64,
TCA_POLICE_PKTBURST64,
__TCA_POLICE_MAX __TCA_POLICE_MAX
#define TCA_POLICE_RESULT TCA_POLICE_RESULT #define TCA_POLICE_RESULT TCA_POLICE_RESULT
}; };
@ -575,6 +579,11 @@ enum {
TCA_FLOWER_KEY_CT_LABELS, /* u128 */ TCA_FLOWER_KEY_CT_LABELS, /* u128 */
TCA_FLOWER_KEY_CT_LABELS_MASK, /* u128 */ TCA_FLOWER_KEY_CT_LABELS_MASK, /* u128 */
TCA_FLOWER_KEY_MPLS_OPTS,
TCA_FLOWER_KEY_HASH, /* u32 */
TCA_FLOWER_KEY_HASH_MASK, /* u32 */
__TCA_FLOWER_MAX, __TCA_FLOWER_MAX,
}; };
@ -585,6 +594,9 @@ enum {
TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */ TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */
TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */ TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */
TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */ TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */
TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */
TCA_FLOWER_KEY_CT_FLAGS_REPLY = 1 << 5, /* Packet is in the reply direction. */
__TCA_FLOWER_KEY_CT_FLAGS_MAX,
}; };
enum { enum {
@ -639,6 +651,27 @@ enum {
#define TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX \ #define TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX \
(__TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX - 1) (__TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX - 1)
enum {
TCA_FLOWER_KEY_MPLS_OPTS_UNSPEC,
TCA_FLOWER_KEY_MPLS_OPTS_LSE,
__TCA_FLOWER_KEY_MPLS_OPTS_MAX,
};
#define TCA_FLOWER_KEY_MPLS_OPTS_MAX (__TCA_FLOWER_KEY_MPLS_OPTS_MAX - 1)
enum {
TCA_FLOWER_KEY_MPLS_OPT_LSE_UNSPEC,
TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH,
TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL,
TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS,
TCA_FLOWER_KEY_MPLS_OPT_LSE_TC,
TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL,
__TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX,
};
#define TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX \
(__TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX - 1)
enum { enum {
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = (1 << 0), TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = (1 << 0),
TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1), TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),

View File

@ -257,6 +257,8 @@ enum {
TCA_RED_STAB, TCA_RED_STAB,
TCA_RED_MAX_P, TCA_RED_MAX_P,
TCA_RED_FLAGS, /* bitfield32 */ TCA_RED_FLAGS, /* bitfield32 */
TCA_RED_EARLY_DROP_BLOCK, /* u32 */
TCA_RED_MARK_BLOCK, /* u32 */
__TCA_RED_MAX, __TCA_RED_MAX,
}; };
@ -432,6 +434,7 @@ enum {
TCA_HTB_RATE64, TCA_HTB_RATE64,
TCA_HTB_CEIL64, TCA_HTB_CEIL64,
TCA_HTB_PAD, TCA_HTB_PAD,
TCA_HTB_OFFLOAD,
__TCA_HTB_MAX, __TCA_HTB_MAX,
}; };
@ -824,6 +827,8 @@ struct tc_codel_xstats {
/* FQ_CODEL */ /* FQ_CODEL */
#define FQ_CODEL_QUANTUM_MAX (1 << 20)
enum { enum {
TCA_FQ_CODEL_UNSPEC, TCA_FQ_CODEL_UNSPEC,
TCA_FQ_CODEL_TARGET, TCA_FQ_CODEL_TARGET,
@ -835,6 +840,8 @@ enum {
TCA_FQ_CODEL_CE_THRESHOLD, TCA_FQ_CODEL_CE_THRESHOLD,
TCA_FQ_CODEL_DROP_BATCH_SIZE, TCA_FQ_CODEL_DROP_BATCH_SIZE,
TCA_FQ_CODEL_MEMORY_LIMIT, TCA_FQ_CODEL_MEMORY_LIMIT,
TCA_FQ_CODEL_CE_THRESHOLD_SELECTOR,
TCA_FQ_CODEL_CE_THRESHOLD_MASK,
__TCA_FQ_CODEL_MAX __TCA_FQ_CODEL_MAX
}; };
@ -913,6 +920,10 @@ enum {
TCA_FQ_TIMER_SLACK, /* timer slack */ TCA_FQ_TIMER_SLACK, /* timer slack */
TCA_FQ_HORIZON, /* time horizon in us */
TCA_FQ_HORIZON_DROP, /* drop packets beyond horizon, or cap their EDT */
__TCA_FQ_MAX __TCA_FQ_MAX
}; };
@ -932,6 +943,8 @@ struct tc_fq_qd_stats {
__u32 throttled_flows; __u32 throttled_flows;
__u32 unthrottle_latency_ns; __u32 unthrottle_latency_ns;
__u64 ce_mark; /* packets above ce_threshold */ __u64 ce_mark; /* packets above ce_threshold */
__u64 horizon_drops;
__u64 horizon_caps;
}; };
/* Heavy-Hitter Filter */ /* Heavy-Hitter Filter */

48
include/uapi/linux/rpl.h Normal file
View File

@ -0,0 +1,48 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 RPL-SR implementation
*
* Author:
* (C) 2020 Alexander Aring <alex.aring@gmail.com>
*/
#ifndef _LINUX_RPL_H
#define _LINUX_RPL_H
#include <asm/byteorder.h>
#include <linux/types.h>
#include <linux/in6.h>
/*
* RPL SR Header
*/
struct ipv6_rpl_sr_hdr {
__u8 nexthdr;
__u8 hdrlen;
__u8 type;
__u8 segments_left;
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u32 cmpre:4,
cmpri:4,
reserved:4,
pad:4,
reserved1:16;
#elif defined(__BIG_ENDIAN_BITFIELD)
__u32 cmpri:4,
cmpre:4,
pad:4,
reserved:20;
#else
#error "Please fix <asm/byteorder.h>"
#endif
union {
struct in6_addr addr[0];
__u8 data[0];
} segments;
} __attribute__((packed));
#define rpl_segaddr segments.addr
#define rpl_segdata segments.data
#endif

View File

@ -0,0 +1,21 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 RPL-SR implementation
*
* Author:
* (C) 2020 Alexander Aring <alex.aring@gmail.com>
*/
#ifndef _LINUX_RPL_IPTUNNEL_H
#define _LINUX_RPL_IPTUNNEL_H
enum {
RPL_IPTUNNEL_UNSPEC,
RPL_IPTUNNEL_SRH,
__RPL_IPTUNNEL_MAX,
};
#define RPL_IPTUNNEL_MAX (__RPL_IPTUNNEL_MAX - 1)
#define RPL_IPTUNNEL_SRH_SIZE(srh) (((srh)->hdrlen + 1) << 3)
#endif

View File

@ -178,6 +178,13 @@ enum {
RTM_GETVLAN, RTM_GETVLAN,
#define RTM_GETVLAN RTM_GETVLAN #define RTM_GETVLAN RTM_GETVLAN
RTM_NEWNEXTHOPBUCKET = 116,
#define RTM_NEWNEXTHOPBUCKET RTM_NEWNEXTHOPBUCKET
RTM_DELNEXTHOPBUCKET,
#define RTM_DELNEXTHOPBUCKET RTM_DELNEXTHOPBUCKET
RTM_GETNEXTHOPBUCKET,
#define RTM_GETNEXTHOPBUCKET RTM_GETNEXTHOPBUCKET
__RTM_MAX, __RTM_MAX,
#define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
}; };
@ -257,12 +264,12 @@ enum {
/* rtm_protocol */ /* rtm_protocol */
#define RTPROT_UNSPEC 0 #define RTPROT_UNSPEC 0
#define RTPROT_REDIRECT 1 /* Route installed by ICMP redirects; #define RTPROT_REDIRECT 1 /* Route installed by ICMP redirects;
not used by current IPv4 */ not used by current IPv4 */
#define RTPROT_KERNEL 2 /* Route installed by kernel */ #define RTPROT_KERNEL 2 /* Route installed by kernel */
#define RTPROT_BOOT 3 /* Route installed during boot */ #define RTPROT_BOOT 3 /* Route installed during boot */
#define RTPROT_STATIC 4 /* Route installed by administrator */ #define RTPROT_STATIC 4 /* Route installed by administrator */
/* Values of protocol >= RTPROT_STATIC are not interpreted by kernel; /* Values of protocol >= RTPROT_STATIC are not interpreted by kernel;
they are just passed from user and back as is. they are just passed from user and back as is.
@ -271,22 +278,24 @@ enum {
avoid conflicts. avoid conflicts.
*/ */
#define RTPROT_GATED 8 /* Apparently, GateD */ #define RTPROT_GATED 8 /* Apparently, GateD */
#define RTPROT_RA 9 /* RDISC/ND router advertisements */ #define RTPROT_RA 9 /* RDISC/ND router advertisements */
#define RTPROT_MRT 10 /* Merit MRT */ #define RTPROT_MRT 10 /* Merit MRT */
#define RTPROT_ZEBRA 11 /* Zebra */ #define RTPROT_ZEBRA 11 /* Zebra */
#define RTPROT_BIRD 12 /* BIRD */ #define RTPROT_BIRD 12 /* BIRD */
#define RTPROT_DNROUTED 13 /* DECnet routing daemon */ #define RTPROT_DNROUTED 13 /* DECnet routing daemon */
#define RTPROT_XORP 14 /* XORP */ #define RTPROT_XORP 14 /* XORP */
#define RTPROT_NTK 15 /* Netsukuku */ #define RTPROT_NTK 15 /* Netsukuku */
#define RTPROT_DHCP 16 /* DHCP client */ #define RTPROT_DHCP 16 /* DHCP client */
#define RTPROT_MROUTED 17 /* Multicast daemon */ #define RTPROT_MROUTED 17 /* Multicast daemon */
#define RTPROT_BABEL 42 /* Babel daemon */ #define RTPROT_KEEPALIVED 18 /* Keepalived daemon */
#define RTPROT_BGP 186 /* BGP Routes */ #define RTPROT_BABEL 42 /* Babel daemon */
#define RTPROT_ISIS 187 /* ISIS Routes */ #define RTPROT_OPENR 99 /* Open Routing (Open/R) Routes */
#define RTPROT_OSPF 188 /* OSPF Routes */ #define RTPROT_BGP 186 /* BGP Routes */
#define RTPROT_RIP 189 /* RIP Routes */ #define RTPROT_ISIS 187 /* ISIS Routes */
#define RTPROT_EIGRP 192 /* EIGRP Routes */ #define RTPROT_OSPF 188 /* OSPF Routes */
#define RTPROT_RIP 189 /* RIP Routes */
#define RTPROT_EIGRP 192 /* EIGRP Routes */
/* rtm_scope /* rtm_scope
@ -318,6 +327,11 @@ enum rt_scope_t {
#define RTM_F_FIB_MATCH 0x2000 /* return full fib lookup match */ #define RTM_F_FIB_MATCH 0x2000 /* return full fib lookup match */
#define RTM_F_OFFLOAD 0x4000 /* route is offloaded */ #define RTM_F_OFFLOAD 0x4000 /* route is offloaded */
#define RTM_F_TRAP 0x8000 /* route is trapping packets */ #define RTM_F_TRAP 0x8000 /* route is trapping packets */
#define RTM_F_OFFLOAD_FAILED 0x20000000 /* route offload failed, this value
* is chosen to avoid conflicts with
* other flags defined in
* include/uapi/linux/ipv6_route.h
*/
/* Reserved table identifiers */ /* Reserved table identifiers */
@ -395,11 +409,13 @@ struct rtnexthop {
#define RTNH_F_DEAD 1 /* Nexthop is dead (used by multipath) */ #define RTNH_F_DEAD 1 /* Nexthop is dead (used by multipath) */
#define RTNH_F_PERVASIVE 2 /* Do recursive gateway lookup */ #define RTNH_F_PERVASIVE 2 /* Do recursive gateway lookup */
#define RTNH_F_ONLINK 4 /* Gateway is forced on link */ #define RTNH_F_ONLINK 4 /* Gateway is forced on link */
#define RTNH_F_OFFLOAD 8 /* offloaded route */ #define RTNH_F_OFFLOAD 8 /* Nexthop is offloaded */
#define RTNH_F_LINKDOWN 16 /* carrier-down on nexthop */ #define RTNH_F_LINKDOWN 16 /* carrier-down on nexthop */
#define RTNH_F_UNRESOLVED 32 /* The entry is unresolved (ipmr) */ #define RTNH_F_UNRESOLVED 32 /* The entry is unresolved (ipmr) */
#define RTNH_F_TRAP 64 /* Nexthop is trapping packets */
#define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | RTNH_F_OFFLOAD) #define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | \
RTNH_F_OFFLOAD | RTNH_F_TRAP)
/* Macros to handle hexthops */ /* Macros to handle hexthops */
@ -609,11 +625,17 @@ enum {
TCA_HW_OFFLOAD, TCA_HW_OFFLOAD,
TCA_INGRESS_BLOCK, TCA_INGRESS_BLOCK,
TCA_EGRESS_BLOCK, TCA_EGRESS_BLOCK,
TCA_DUMP_FLAGS,
__TCA_MAX __TCA_MAX
}; };
#define TCA_MAX (__TCA_MAX - 1) #define TCA_MAX (__TCA_MAX - 1)
#define TCA_DUMP_FLAGS_TERSE (1 << 0) /* Means that in dump user gets only basic
* data necessary to identify the objects
* (handle, cookie, etc.) and stats.
*/
#define TCA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcmsg)))) #define TCA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcmsg))))
#define TCA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcmsg)) #define TCA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcmsg))
@ -757,18 +779,27 @@ enum {
#define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg)) #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
/* tcamsg flags stored in attribute TCA_ROOT_FLAGS /* tcamsg flags stored in attribute TCA_ROOT_FLAGS
* *
* TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO * TCA_ACT_FLAG_LARGE_DUMP_ON user->kernel to request for larger than
* actions in a dump. All dump responses will contain the number of actions * TCA_ACT_MAX_PRIO actions in a dump. All dump responses will contain the
* being dumped stored in for user app's consumption in TCA_ROOT_COUNT * number of actions being dumped stored in for user app's consumption in
* TCA_ROOT_COUNT
*
* TCA_ACT_FLAG_TERSE_DUMP user->kernel to request terse (brief) dump that only
* includes essential action info (kind, index, etc.)
* *
*/ */
#define TCA_FLAG_LARGE_DUMP_ON (1 << 0) #define TCA_FLAG_LARGE_DUMP_ON (1 << 0)
#define TCA_ACT_FLAG_LARGE_DUMP_ON TCA_FLAG_LARGE_DUMP_ON
#define TCA_ACT_FLAG_TERSE_DUMP (1 << 1)
/* New extended info filters for IFLA_EXT_MASK */ /* New extended info filters for IFLA_EXT_MASK */
#define RTEXT_FILTER_VF (1 << 0) #define RTEXT_FILTER_VF (1 << 0)
#define RTEXT_FILTER_BRVLAN (1 << 1) #define RTEXT_FILTER_BRVLAN (1 << 1)
#define RTEXT_FILTER_BRVLAN_COMPRESSED (1 << 2) #define RTEXT_FILTER_BRVLAN_COMPRESSED (1 << 2)
#define RTEXT_FILTER_SKIP_STATS (1 << 3) #define RTEXT_FILTER_SKIP_STATS (1 << 3)
#define RTEXT_FILTER_MRP (1 << 4)
#define RTEXT_FILTER_CFM_CONFIG (1 << 5)
#define RTEXT_FILTER_CFM_STATUS (1 << 6)
/* End of information exported to user level */ /* End of information exported to user level */

View File

@ -140,6 +140,8 @@ typedef __s32 sctp_assoc_t;
#define SCTP_ECN_SUPPORTED 130 #define SCTP_ECN_SUPPORTED 130
#define SCTP_EXPOSE_POTENTIALLY_FAILED_STATE 131 #define SCTP_EXPOSE_POTENTIALLY_FAILED_STATE 131
#define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE #define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE
#define SCTP_REMOTE_UDP_ENCAPS_PORT 132
#define SCTP_PLPMTUD_PROBE_INTERVAL 133
/* PR-SCTP policies */ /* PR-SCTP policies */
#define SCTP_PR_SCTP_NONE 0x0000 #define SCTP_PR_SCTP_NONE 0x0000
@ -1191,6 +1193,12 @@ struct sctp_event {
uint8_t se_on; uint8_t se_on;
}; };
struct sctp_udpencaps {
sctp_assoc_t sue_assoc_id;
struct sockaddr_storage sue_address;
uint16_t sue_port;
};
/* SCTP Stream schedulers */ /* SCTP Stream schedulers */
enum sctp_sched_type { enum sctp_sched_type {
SCTP_SS_FCFS, SCTP_SS_FCFS,
@ -1200,4 +1208,11 @@ enum sctp_sched_type {
SCTP_SS_MAX = SCTP_SS_RR SCTP_SS_MAX = SCTP_SS_RR
}; };
/* Probe Interval socket option */
struct sctp_probeinterval {
sctp_assoc_t spi_assoc_id;
struct sockaddr_storage spi_address;
__u32 spi_interval;
};
#endif /* _SCTP_H */ #endif /* _SCTP_H */

View File

@ -37,5 +37,4 @@ enum {
SEG6_IPTUN_MODE_L2ENCAP, SEG6_IPTUN_MODE_L2ENCAP,
}; };
#endif #endif

View File

@ -26,6 +26,8 @@ enum {
SEG6_LOCAL_IIF, SEG6_LOCAL_IIF,
SEG6_LOCAL_OIF, SEG6_LOCAL_OIF,
SEG6_LOCAL_BPF, SEG6_LOCAL_BPF,
SEG6_LOCAL_VRFTABLE,
SEG6_LOCAL_COUNTERS,
__SEG6_LOCAL_MAX, __SEG6_LOCAL_MAX,
}; };
#define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1) #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
@ -62,6 +64,8 @@ enum {
SEG6_LOCAL_ACTION_END_AM = 14, SEG6_LOCAL_ACTION_END_AM = 14,
/* custom BPF action */ /* custom BPF action */
SEG6_LOCAL_ACTION_END_BPF = 15, SEG6_LOCAL_ACTION_END_BPF = 15,
/* decap and lookup of DA in v4 or v6 table */
SEG6_LOCAL_ACTION_END_DT46 = 16,
__SEG6_LOCAL_ACTION_MAX, __SEG6_LOCAL_ACTION_MAX,
}; };
@ -77,4 +81,33 @@ enum {
#define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1) #define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1)
/* SRv6 Behavior counters are encoded as netlink attributes guaranteeing the
* correct alignment.
* Each counter is identified by a different attribute type (i.e.
* SEG6_LOCAL_CNT_PACKETS).
*
* - SEG6_LOCAL_CNT_PACKETS: identifies a counter that counts the number of
* packets that have been CORRECTLY processed by an SRv6 Behavior instance
* (i.e., packets that generate errors or are dropped are NOT counted).
*
* - SEG6_LOCAL_CNT_BYTES: identifies a counter that counts the total amount
* of traffic in bytes of all packets that have been CORRECTLY processed by
* an SRv6 Behavior instance (i.e., packets that generate errors or are
* dropped are NOT counted).
*
* - SEG6_LOCAL_CNT_ERRORS: identifies a counter that counts the number of
* packets that have NOT been properly processed by an SRv6 Behavior instance
* (i.e., packets that generate errors or are dropped).
*/
enum {
SEG6_LOCAL_CNT_UNSPEC,
SEG6_LOCAL_CNT_PAD, /* pad for 64 bits values */
SEG6_LOCAL_CNT_PACKETS,
SEG6_LOCAL_CNT_BYTES,
SEG6_LOCAL_CNT_ERRORS,
__SEG6_LOCAL_CNT_MAX,
};
#define SEG6_LOCAL_CNT_MAX (__SEG6_LOCAL_CNT_MAX - 1)
#endif #endif

View File

@ -159,6 +159,7 @@ enum
UDP_MIB_SNDBUFERRORS, /* SndbufErrors */ UDP_MIB_SNDBUFERRORS, /* SndbufErrors */
UDP_MIB_CSUMERRORS, /* InCsumErrors */ UDP_MIB_CSUMERRORS, /* InCsumErrors */
UDP_MIB_IGNOREDMULTI, /* IgnoredMulti */ UDP_MIB_IGNOREDMULTI, /* IgnoredMulti */
UDP_MIB_MEMERRORS, /* MemErrors */
__UDP_MIB_MAX __UDP_MIB_MAX
}; };
@ -287,6 +288,10 @@ enum
LINUX_MIB_TCPFASTOPENPASSIVEALTKEY, /* TCPFastOpenPassiveAltKey */ LINUX_MIB_TCPFASTOPENPASSIVEALTKEY, /* TCPFastOpenPassiveAltKey */
LINUX_MIB_TCPTIMEOUTREHASH, /* TCPTimeoutRehash */ LINUX_MIB_TCPTIMEOUTREHASH, /* TCPTimeoutRehash */
LINUX_MIB_TCPDUPLICATEDATAREHASH, /* TCPDuplicateDataRehash */ LINUX_MIB_TCPDUPLICATEDATAREHASH, /* TCPDuplicateDataRehash */
LINUX_MIB_TCPDSACKRECVSEGS, /* TCPDSACKRecvSegs */
LINUX_MIB_TCPDSACKIGNOREDDUBIOUS, /* TCPDSACKIgnoredDubious */
LINUX_MIB_TCPMIGRATEREQSUCCESS, /* TCPMigrateReqSuccess */
LINUX_MIB_TCPMIGRATEREQFAILURE, /* TCPMigrateReqFailure */
__LINUX_MIB_MAX __LINUX_MIB_MAX
}; };

View File

@ -26,4 +26,9 @@ struct __kernel_sockaddr_storage {
}; };
}; };
#define SOCK_SNDBUF_LOCK 1
#define SOCK_RCVBUF_LOCK 2
#define SOCK_BUF_LOCK_MASK (SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK)
#endif /* _LINUX_SOCKET_H */ #endif /* _LINUX_SOCKET_H */

View File

@ -4,3 +4,40 @@
#ifndef __always_inline #ifndef __always_inline
#define __always_inline __inline__ #define __always_inline __inline__
#endif #endif
/**
* __struct_group() - Create a mirrored named and anonyomous struct
*
* @TAG: The tag name for the named sub-struct (usually empty)
* @NAME: The identifier name of the mirrored sub-struct
* @ATTRS: Any struct attributes (usually empty)
* @MEMBERS: The member declarations for the mirrored structs
*
* Used to create an anonymous union of two structs with identical layout
* and size: one anonymous and one named. The former's members can be used
* normally without sub-struct naming, and the latter can be used to
* reason about the start, end, and size of the group of struct members.
* The named struct can also be explicitly tagged for layer reuse, as well
* as both having struct attributes appended.
*/
#define __struct_group(TAG, NAME, ATTRS, MEMBERS...) \
union { \
struct { MEMBERS } ATTRS; \
struct TAG { MEMBERS } ATTRS NAME; \
}
/**
* __DECLARE_FLEX_ARRAY() - Declare a flexible array usable in a union
*
* @TYPE: The type of each flexible array element
* @NAME: The name of the flexible array member
*
* In order to have a flexible array member in a union or alone in a
* struct, it needs to be wrapped in an anonymous struct with at least 1
* named member, but that member can be empty.
*/
#define __DECLARE_FLEX_ARRAY(TYPE, NAME) \
struct { \
struct { } __empty_ ## NAME; \
TYPE NAME[]; \
}

View File

@ -0,0 +1,47 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/* Copyright 2020 NXP */
#ifndef __LINUX_TC_GATE_H
#define __LINUX_TC_GATE_H
#include <linux/pkt_cls.h>
struct tc_gate {
tc_gen;
};
enum {
TCA_GATE_ENTRY_UNSPEC,
TCA_GATE_ENTRY_INDEX,
TCA_GATE_ENTRY_GATE,
TCA_GATE_ENTRY_INTERVAL,
TCA_GATE_ENTRY_IPV,
TCA_GATE_ENTRY_MAX_OCTETS,
__TCA_GATE_ENTRY_MAX,
};
#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
enum {
TCA_GATE_ONE_ENTRY_UNSPEC,
TCA_GATE_ONE_ENTRY,
__TCA_GATE_ONE_ENTRY_MAX,
};
#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
enum {
TCA_GATE_UNSPEC,
TCA_GATE_TM,
TCA_GATE_PARMS,
TCA_GATE_PAD,
TCA_GATE_PRIORITY,
TCA_GATE_ENTRY_LIST,
TCA_GATE_BASE_TIME,
TCA_GATE_CYCLE_TIME,
TCA_GATE_CYCLE_TIME_EXT,
TCA_GATE_FLAGS,
TCA_GATE_CLOCKID,
__TCA_GATE_MAX,
};
#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
#endif

View File

@ -10,6 +10,7 @@
#define TCA_MPLS_ACT_PUSH 2 #define TCA_MPLS_ACT_PUSH 2
#define TCA_MPLS_ACT_MODIFY 3 #define TCA_MPLS_ACT_MODIFY 3
#define TCA_MPLS_ACT_DEC_TTL 4 #define TCA_MPLS_ACT_DEC_TTL 4
#define TCA_MPLS_ACT_MAC_PUSH 5
struct tc_mpls { struct tc_mpls {
tc_gen; /* generic TC action fields. */ tc_gen; /* generic TC action fields. */

View File

@ -17,6 +17,7 @@
#define SKBMOD_F_SMAC 0x2 #define SKBMOD_F_SMAC 0x2
#define SKBMOD_F_ETYPE 0x4 #define SKBMOD_F_ETYPE 0x4
#define SKBMOD_F_SWAPMAC 0x8 #define SKBMOD_F_SWAPMAC 0x8
#define SKBMOD_F_ECN 0x10
struct tc_skbmod { struct tc_skbmod {
tc_gen; tc_gen;

View File

@ -16,6 +16,8 @@
#define TCA_VLAN_ACT_POP 1 #define TCA_VLAN_ACT_POP 1
#define TCA_VLAN_ACT_PUSH 2 #define TCA_VLAN_ACT_PUSH 2
#define TCA_VLAN_ACT_MODIFY 3 #define TCA_VLAN_ACT_MODIFY 3
#define TCA_VLAN_ACT_POP_ETH 4
#define TCA_VLAN_ACT_PUSH_ETH 5
struct tc_vlan { struct tc_vlan {
tc_gen; tc_gen;
@ -30,6 +32,8 @@ enum {
TCA_VLAN_PUSH_VLAN_PROTOCOL, TCA_VLAN_PUSH_VLAN_PROTOCOL,
TCA_VLAN_PAD, TCA_VLAN_PAD,
TCA_VLAN_PUSH_VLAN_PRIORITY, TCA_VLAN_PUSH_VLAN_PRIORITY,
TCA_VLAN_PUSH_ETH_DST,
TCA_VLAN_PUSH_ETH_SRC,
__TCA_VLAN_MAX, __TCA_VLAN_MAX,
}; };
#define TCA_VLAN_MAX (__TCA_VLAN_MAX - 1) #define TCA_VLAN_MAX (__TCA_VLAN_MAX - 1)

Some files were not shown because too many files have changed in this diff Show More